python合理拆分类别_如何用Python进行词组拆分?
【20200904】
可以先split/,做好特殊字符標(biāo)記,存儲(chǔ)到臨時(shí)變量里面,比如,元組,數(shù)組,或者字典之類的;再遍歷上面的變量,拆分括號(hào),用一個(gè)特殊標(biāo)記,標(biāo)記括號(hào)里面的內(nèi)容,總之找到區(qū)分括號(hào)和非括號(hào)內(nèi)容就可以,之后存儲(chǔ)到變量;最后遍歷第二個(gè)變量,生成句型
【20200905】
抱歉最近精神狀態(tài)不太好,又比較忙,今天大概寫了下,應(yīng)該沒有啥問(wèn)題,還有就是生成循序的問(wèn)題,這個(gè)我有時(shí)間再看下,如果要改的話大概是bottom_fuc函數(shù),和調(diào)用它的那里的邏輯。還有一種方式就是對(duì)每個(gè)句型生成一個(gè)列表,最后直接joint,但是我覺得這樣會(huì)占更大的緩存空間,所以沒有用。代碼直接貼上來(lái)
import logging
import re
f = open("./phasesplit")
line_true = f.readline()
list_all = []
list_size = 0
i = 0
# 將兩個(gè)參數(shù)進(jìn)行排列組合連接
# inner_list:待添加的字符串列表
# org_str_list:已經(jīng)連接的字符串列表
def bottom_fuc(inner_list = list, org_str_list = list):
inner_new_str_list = list()
for s in inner_list:
st = str(s)
for s1 in org_str_list:
st1 = str(s1)
inner_new_str_list.append(st1 + " " + st)
return inner_new_str_list
#主循環(huán)
while line_true:
# 保存分號(hào)后的內(nèi)容
semi_str = ""
# 分號(hào)前面的內(nèi)容
line = ""
# 可以判斷分號(hào)個(gè)數(shù),這里不進(jìn)行判斷
if line_true.find(";") > 0:
# 賦值
line, semi_str = line_true.split(";")
semi_str = str(semi_str).strip()
line = str(line).strip()
else:
line = line_true
list_for_loop = re.split("(\(.+?\))", line)
list_for_loop_new = []
# 繼續(xù)進(jìn)行數(shù)據(jù)置換
for ind, lp in enumerate(list_for_loop, 0):
tmp_lp = lp
# 存在空格且沒有括號(hào)
if tmp_lp.find("(") + tmp_lp.find(")") < 0 and tmp_lp.find(" "):
# 進(jìn)行置換
for data in tmp_lp.split(" "):
list_for_loop_new.append(data)
else:
list_for_loop_new.append(lp)
list_str = []
# 將數(shù)據(jù)進(jìn)行進(jìn)一步拆分
for ind, s in enumerate(list_for_loop_new, 0):
str_tmp = s
pare_flg = 0
# 去除括號(hào),添加空格
if str_tmp.find("(")+str_tmp.find(")") >= 0:
str_tmp = str_tmp.strip(r"(").strip(r")")
str_tmp = " /"+ str_tmp
pare_flg = 1
# 按/拆分
if str_tmp.find("/") >= 0:
if pare_flg == 1:
pare_str = str_tmp.split("/")
list_str.append(pare_str)
else:
list_str.append(str_tmp.split("/"))
else:
list_str.append(str_tmp)
pare_flg = 0
new_str_list = []
# 組裝拆分后的數(shù)據(jù)
for l_str in list_str:
if isinstance(l_str, str):
if len(new_str_list) == 0:
new_str_list.append(l_str)
else:
for ind, ns in enumerate(new_str_list, 0):
new_str_list[ind] = new_str_list[ind] + " " +l_str
elif isinstance(l_str, list):
if len(new_str_list) == 0:
new_str_list.append("")
new_str_list = bottom_fuc(l_str, new_str_list)
else:
logging.error("錯(cuò)誤類型: ", type(l_str), l_str)
exit(-1)
# 格式處理
for ind, ns in enumerate(new_str_list, 0):
ns.rstrip("\r\n")
if len(semi_str) > 0:
new_str_list[ind] = re.sub(" {2,}", " ", new_str_list[ind].strip()) + ";" + semi_str
else:
new_str_list[ind] = re.sub(" {2,}", " ", new_str_list[ind].strip())
if len(semi_str) > 0:
new_str_list.insert(0, line + ";" + semi_str)
else:
new_str_list.insert(0, line.rstrip("\r\n"))
i += 1
# 讀取下一行
line_true = f.readline()
# 添加到總列表
list_all.append(new_str_list)
list_size = i
f.close()
# 寫文件
with open("result.txt", "w") as nf:
nf.write("#############################################\r")
nf.write("#section:{}\r".format(list_size))
nf.write("#############################################\r")
for la in list_all:
for nl in la:
nf.write(nl+"\r")
nf.write("\r")
nf.write("#############################################\r")
nf.close()
輸入文件(phasesplit)
quarrel (with sb) about/for/over ; 2313
dabble at/in/with
(sb/sth) damn and blast (sb/sth)
dance on/upon a rope/nothing
dance on (the) air
dead/flat/stark calm
do/go/make the/one's round
do (sb/sth) grace
輸出文件(result.txt)
#############################################
#section:8
#############################################
quarrel (with sb) about/for/over;2313
quarrel about;2313
quarrel with sb about;2313
quarrel for;2313
quarrel with sb for;2313
quarrel over;2313
quarrel with sb over;2313
#############################################
dabble at/in/with
dabble at
dabble in
dabble with
#############################################
(sb/sth) damn and blast (sb/sth)
damn and blast
sb damn and blast
sth damn and blast
damn and blast sb
sb damn and blast sb
sth damn and blast sb
damn and blast sth
sb damn and blast sth
sth damn and blast sth
#############################################
dance on/upon a rope/nothing
dance on a rope
dance upon a rope
dance on a nothing
dance upon a nothing
#############################################
dance on (the) air
dance on air
dance on the air
#############################################
dead/flat/stark calm
dead calm
flat calm
stark calm
#############################################
do/go/make the/one's round
do the round
go the round
make the round
do one's round
go one's round
make one's round
#############################################
do (sb/sth) grace
do grace
do sb grace
do sth grace
#############################################
總結(jié)
以上是生活随笔為你收集整理的python合理拆分类别_如何用Python进行词组拆分?的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 薯条是什么?
- 下一篇: 泰国带叶子的饮料为啥不能喝