停用词过滤
stop_word_path = "InferenceSystem/src/I5_algorithm/NLP數據集合/停詞庫/stop_word_for_chinese.txt"def del_element(strings,symbles):srcrep = {i:'' for i in symbles }rep = dict((re.escape(k), v) for k, v in srcrep.items())pattern = re.compile("|".join(rep.keys()))return pattern.sub(lambda m: rep[re.escape(m.group(0))], strings)def filter_stop_word(strings,stop_word=np.loadtxt(stop_word_path,dtype=str)):return del_element(strings,stop_word)src = '資源來源網絡,侵刪 很好的資料,趕快學起來'
filter_stop_word(src)
filter_stop_word(src,', ')
總結
- 上一篇: 【Linux进程、线程、任务调度】一
- 下一篇: centos 安装 MatConvNet