Faiss库了解
搜索庫Faiss
Faiss全稱(Facebook AI Similarity Search)是Facebook AI團隊開源的針對聚類和相似性搜索庫,為稠密向量提供高效相似度搜索和聚類,支持十億級別向量的搜索,是目前較成熟的近似近鄰搜索庫。
參考介紹
【用法1】、【推薦】、【用法3】
在Cosplace工程中test.py具體代碼如下:
import faiss import time # Compute R@1, R@5, R@10, R@20 RECALL_VALUES = [1, 5, 10, 20] #。。。 queries_descriptors = all_descriptors[eval_ds.database_num:] database_descriptors = all_descriptors[:eval_ds.database_num]#Use a kNN to find predictionstic = time.time()faiss_index = faiss.IndexFlatL2(args.fc_output_dim)faiss_index.add(database_descriptors)print('Index built in {} sec'.format(time.time() - tic))del database_descriptors, all_descriptorslogging.debug("Calculating recalls")_, predictions = faiss_index.search(queries_descriptors, max(RECALL_VALUES))print('Searched in {} sec'.format(time.time() - tic))print(predictions.shape)print(predictions[:5])nlist = 100 # 單元格數tic = time.time()quantizer = faiss.IndexFlatL2(args.fc_output_dim) # the other index d是向量維度index = faiss.IndexIVFFlat(quantizer, args.fc_output_dim, nlist, faiss.METRIC_L2) # # here we specify METRIC_L2, by default it performs inner-product search# assert not index.is_trainedindex.train(database_descriptors)# assert index.is_trainedindex.add(database_descriptors) # add may be a bit slower as wellprint('Index built in {} sec'.format(time.time() - tic))index.nprobe = 10 # 執行搜索訪問的單元格數(nlist以外) # default nprobe is 1, try a few moreD, I = index.search(queries_descriptors, max(RECALL_VALUES)) # actual searchprint('Searched in {} sec'.format(time.time() - tic))# print("D.shape: ",D.shape)# print("D[:5]", D[:5])print("I.shape: ", I.shape)print("I[:5]",I[:5]) # neighbors of the 5 last queries# IndexIVFPQ索引方式nlist = 100m = 64tic = time.time()quantizer = faiss.IndexFlatL2(args.fc_output_dim) # this remains the same# 為了擴展到非常大的數據集,Faiss提供了基于產品量化器的有損壓縮來壓縮存儲的向量的變體。壓縮的方法基于乘積量化。損失了一定精度為代價, 自身距離也不為0, 這是由于有損壓縮。index = faiss.IndexIVFPQ(quantizer, args.fc_output_dim, nlist, m, 8)# 8 specifies that each sub-vector is encoded as 8 bitsindex.train(database_descriptors)index.add(database_descriptors)print('Searched in {} sec'.format(time.time() - tic))# D, I = index.search(xb[:5], k) # sanity check# print(I)# print(D)index.nprobe = 10 # make comparable with experiment above_, I = index.search(queries_descriptors, max(RECALL_VALUES)) # searchprint('Searched in {} sec'.format(time.time() - tic))# print(I[:5])如上便是實現IndexFlatL2、IndexIVFFlat、IndexIVFPQ三種索引方式的代碼。在數據集上測試,其中database為1700張圖片,query為10000張,查詢top20最后測試結果為:
總結:理論上IndexIVFPQ效率應該更高,但在小數據庫中反而包里搜索IndexFlatL2速度更快,依靠歐氏距離計算,而IndexIVFFlat和IndexIVFPQ都有個訓練的過程。
總結
- 上一篇: 【量化投资实训】基于MATLAB实验三.
- 下一篇: 微信小程序实现js控制动画——点击播放动