inverted index 反向索引 python
生活随笔
收集整理的這篇文章主要介紹了
inverted index 反向索引 python
小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
一、簡(jiǎn)單版
from collections import defaultdict class inverted_index:def __init__(self, docs):self.doc = defaultdict(set)for index, doc in enumerate(docs):for term in doc.split():self.doc[term].add(index)def search(self, term):return self.doc[term]if __name__ == "__main__":docs = ["new home sales top forecasts june june june","home sales rise in july june","increase in home sales in july","july new home sales new rise"]i = inverted_index(docs)a = 1print(i.search('sales'))# 結(jié)果:{0, 1, 2, 3}?
二、 nltk 單詞版本,反向索引 存儲(chǔ) ->? search
# 【2】構(gòu)建nltk中的單詞 反向索引 """ 結(jié)果1(儲(chǔ)存index): run_time = 0.012042999267578125結(jié)果2(儲(chǔ)存單詞): run_time = 0.02428269386291504結(jié)論:存儲(chǔ)index比存儲(chǔ)單詞速度快一倍左右 """import time from nltk.corpus import words from collections import defaultdictinverted_index = defaultdict(set) # 如果同一個(gè)單詞出現(xiàn)了重復(fù)的char,只會(huì)記錄一次,屬于某行,不能用default(list) word_list = words.words() a = 1# 結(jié)果1存儲(chǔ)index速度會(huì)更快,相對(duì)于存儲(chǔ)單詞 for i, word in enumerate(word_list):for char in word.lower():inverted_index[char].add(i)# 結(jié)果2 # for i, word in enumerate(word_list): # for char in word.lower(): # inverted_index[char].add(word)# 需要搜索某個(gè)單詞是否再哪一行, idx 用set.intersection() start = time.time() result = set.intersection(*(inverted_index[char] for char in "aej")) end = time.time() print('run_time = ', end-start) print('result = ', result) print('result_item = ', [word_list[i] for i in result]) def intersection(*args):left = args[0]# Perform len(args)-1 pairwise-intersectionsfor right in args[1:]:# Tests take O(N) time, so minimize N by choosing the smaller setif len(left) > len(right):left, right = right, left# Do the pairwise intersectionresult = set()for element in left:if element in right:result.add(element)left = result # Use as the start for the next intersectionreturn left三、
總結(jié)
以上是生活随笔為你收集整理的inverted index 反向索引 python的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 常用的霍尔效应测试方案
- 下一篇: 释放参数BSTR使用误区以及隐藏的内存破