生活随笔
收集整理的這篇文章主要介紹了
word文件批量转换为txt文档
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
任務(wù):有一個(gè)父文件夾,下邊有若干子文件夾,子文件夾下是若干word文件,沒有其他非word文件。將父文件夾所有的word文件讀取并寫入指定路徑下的txt文件之中。
步驟:
1.讀取父文件夾下所有的子文件夾名稱
2.將子文件夾下所有.docx文件分別轉(zhuǎn)換為.txt文件
3.附加功能,實(shí)現(xiàn)文件轉(zhuǎn)碼。(因?yàn)閣ord的寫入doc.SaveAs()s生成的都是ANSI編碼文件,可以考慮轉(zhuǎn)換為utf-8編碼的文件。當(dāng)然也可以不轉(zhuǎn)。)
4.將同一目錄下所有的txt文件合并成為一個(gè)
直接上代碼
import os
file_dir
= r
"D:\\新建文件夾\\父文件夾"
name
=[]
for root
, dirs
, files
in os
.walk
(file_dir
): for dir in dirs
: name
.append
(dir)
print(name
)
from win32com
import client
as wc
import os
def get_ziwenjianjai():file_dir
= r
"D:\\新建文件夾\\父文件夾"name
=[]for root
, dirs
, files
in os
.walk
(file_dir
): for dir in dirs
: name
.append
(dir)return name
[1:] all_FileNum
= 0
def Translate(level
, path
):global all_FileNum
'''將一個(gè)子文件目錄下所有doc文件轉(zhuǎn)成txt'''files
= os
.listdir
(path
)for f
in files
:if (f
[0] == '~' or f
[0] == '.'):continuenew
= path
+ '\\' + f
print(new
)word
= wc
.Dispatch
('Word.Application') doc
= word
.Documents
.Open
(new
)save
= 'E:\\保存處'+ '\\' + f
[:-5]savepath
= save
+ '.txt' doc
.SaveAs
(savepath
, 4) doc
.Close
()all_FileNum
= all_FileNum
+ 1if __name__
== '__main__':zimulu
=get_ziwenjianjai
()for name
in zimulu
:mypath
= 'D:\\新建文件夾\\父文件夾\\'+ nameTranslate
(1, mypath
)print('文件總數(shù) = ', all_FileNum
)
import os
import codecs
import chardet
def list_folders_files(path
):"""返回 "文件夾" 和 "文件" 名字:param path: "文件夾"和"文件"所在的路徑:return: (list_folders, list_files):list_folders: 文件夾:list_files: 文件"""list_folders
= []list_files
= []for file in os
.listdir
(path
):file_path
= os
.path
.join
(path
, file)if os
.path
.isdir
(file_path
):list_folders
.append
(file)else:list_files
.append
(file)return (list_folders
, list_files
)def convert(file, in_enc
="GBK", out_enc
="UTF-8"):"""該程序用于將目錄下的文件從指定格式轉(zhuǎn)換到指定格式,默認(rèn)的是GBK轉(zhuǎn)到utf-8:param file: 文件路徑:param in_enc: 輸入文件格式:param out_enc: 輸出文件格式:return:"""if in_enc
is None: passelse:in_enc
= in_enc
.upper
()out_enc
= out_enc
.upper
()try:print("convert [ " + file.split
('\\')[-1] + " ].....From " + in_enc
+ " --> " + out_enc
)f
= codecs
.open(file, 'r', in_enc
, "ignore")new_content
= f
.read
()codecs
.open(file, 'w', out_enc
).write
(new_content
)except IOError
as err
:print("I/O error: {0}".format(err
))
if __name__
== "__main__":path
= r
'D:\\新建文件夾\\txt整合' (list_folders
, list_files
) = list_folders_files
(path
)print("Path: " + path
)for fileName
in list_files
:filePath
= path
+ '\\' + fileName
with open(filePath
, "rb") as f
:data
= f
.read
()codeType
= chardet
.detect
(data
)['encoding'] convert
(filePath
, codeType
, 'UTF-8')
import os
import os
.path
import time
time1
=time
.time
()
def MergeTxt(filepath
,outfile
):rootdir
= os
.path
.join
(filepath
)for dirpath
, dirnames
, filenames
in os
.walk
(rootdir
):for filename
in filenames
:if os
.path
.splitext
(filename
)[1] == '.txt':parent
= "D:\\新建文件夾\\保存處"k
= open(parent
+outfile
, 'a+',encoding
='utf-8') txtPath
= os
.path
.join
(rootdir
, filename
) f
= open(txtPath
,encoding
='ANSI') lines
= f
.readlines
()for line
in lines
:line
= line
.lstrip
().replace
('\t','').replace
(' ','') if len(line
) > 15: k
.write
(line
)k
.close
()print("finished")print("輸出文件路徑:",filepath
+outfile
)if __name__
== '__main__':filepath
="D:\\新建文件夾\\待合并文件夾名"outfile
="\\合并raw.txt"MergeTxt
(filepath
,outfile
)time2
= time
.time
()print(u
'總共耗時(shí):' + str(time2
- time1
) + 's')
不足:
1.word的Saveas()方法與f.write()方法不同,對(duì)于存在表格的數(shù)據(jù),word的Saveas()能將表格提取到一行,與原文較類似,但是使用f.readlines()讀取再用f.write()寫入之后,表格中每一個(gè)小方格的數(shù)據(jù)就變成了一行,比較難受,尚未解決。
2.如果word中有圖片或者文件格式比較復(fù)雜,word會(huì)經(jīng)常出現(xiàn)彈窗,需要手動(dòng)點(diǎn)擊彈窗的問題保證程序繼續(xù)運(yùn)行。
3.對(duì)于子文件夾中有其他類型的文件的情況沒有完全考慮,沒試過能不能運(yùn)行。
總結(jié)
以上是生活随笔為你收集整理的word文件批量转换为txt文档的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。