python利用win32com读取doc和pdf内容,并保存到文件
生活随笔
收集整理的這篇文章主要介紹了
python利用win32com读取doc和pdf内容,并保存到文件
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
將使用win32com包進行處理
讀取doc文件
# coding=utf-8 import os, fnmatch from win32com import client as wc from win32com.client import Dispatchdef word2txt(filePath, savePath = ''):dirs, filename = os.path.split(filePath)print(dirs, '\n', filename)new_name = ''if fnmatch.fnmatch(filename, "*.docx"):new_name = filename[:-5] + '.txt'if fnmatch.fnmatch(filename, "*.doc"):new_name = filename[:-4] + '.txt'if savePath == '':savePath = dirselse:savePath = savePathword2txtPath = os.path.join(savePath, new_name)print(word2txtPath)wordappp = wc.Dispatch('Word.Application')mytxt = wordappp.Documents.Open(filePath)mytxt.SaveAs(word2txtPath, 4) # 4代表抽取結果保存為文本mytxt.Close()if __name__ == '__main__':filePath = os.path.abspath(r'./專業課.docx')word2txt(filePath)讀取pdf
# coding=utf-8 import os, fnmatch from win32com import client as wc from win32com.client import Dispatchdef pdf2txt(filePath, savePath=''):dirs, filename = os.path.split(filePath)print(dirs, '\n', filename)new_name = ''if fnmatch.fnmatch(filename, '*.pdf') or fnmatch.fnmatch(filename, '*.PDF'):new_name = filename[:-4] + '.txt'else:print('格式不正確,僅支持pdf格式')returnif savePath == '':savePath = dirselse:savePath = savePathpdf2txtPath = os.path.join(savePath, new_name)print(pdf2txtPath)wordappp = wc.Dispatch('Word.Application')mytxt = wordappp.Documents.Open(filePath)mytxt.SaveAs(pdf2txtPath, 4) # 4代表抽取文本mytxt.Close()if __name__ == '__main__':filePath = os.path.abspath(r'./論文.pdf')pdf2txt(filePath)總結
以上是生活随笔為你收集整理的python利用win32com读取doc和pdf内容,并保存到文件的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 计算机专业同学应该学哪些知识计算机专业学
- 下一篇: gl在中文里是什么意思(gi和gl值高还