java suggest_lucene的suggest(搜索提示功能的实现)
1.首先引入依賴
org.apache.lucene
lucene-suggest
7.2.1
2.既然要進(jìn)行智能聯(lián)想,那么我們需要為提供聯(lián)想的數(shù)據(jù)建立一個(gè)聯(lián)想索引(而不是使用原來的數(shù)據(jù)索引),既然要建立索引,那么我們需要知道建立索引的數(shù)據(jù)來源。我們使用一個(gè)擴(kuò)展自InputIterator的類來定義數(shù)據(jù)來源。首先我們看看被擴(kuò)展的類InputIterator
public interface InputIterator extendsBytesRefIterator {
InputIterator EMPTY= newInputIterator.InputIteratorWrapper(BytesRefIterator.EMPTY);longweight();
BytesRef payload();booleanhasPayloads();
Setcontexts();booleanhasContexts();public static class InputIteratorWrapper implementsInputIterator {private finalBytesRefIterator wrapped;publicInputIteratorWrapper(BytesRefIterator wrapped) {this.wrapped =wrapped;
}public longweight() {return 1L;
}public BytesRef next() throwsIOException {return this.wrapped.next();
}publicBytesRef payload() {return null;
}public booleanhasPayloads() {return false;
}public Setcontexts() {return null;
}public booleanhasContexts() {return false;
}
}
weight():此方法設(shè)置某個(gè)term的權(quán)重,設(shè)置的越高suggest的優(yōu)先級越高;
payload():每個(gè)suggestion對應(yīng)的元數(shù)據(jù)的二進(jìn)制表示,我們在傳輸對象的時(shí)候需要轉(zhuǎn)換對象或?qū)ο蟮哪硞€(gè)屬性為BytesRef類型,相應(yīng)的suggester調(diào)用lookup的時(shí)候會返回payloads信息;
hasPayload():判斷iterator是否有payloads;
contexts():獲取某個(gè)term的contexts,用來過濾suggest的內(nèi)容,如果suggest的列表為空,返回null
hasContexts():獲取iterator是否有contexts;
lucene suggest提供了幾個(gè)InputIteratior的默認(rèn)實(shí)現(xiàn)
BufferedInputIterator:對二進(jìn)制類型的輸入進(jìn)行輪詢;
DocumentInputIterator:從索引中被store的field中輪詢;
FileIterator:從文件中每次讀出單行的數(shù)據(jù)輪詢,以\t進(jìn)行間隔(且\t的個(gè)數(shù)最多為2個(gè));
HighFrequencyIterator:從索引中被store的field輪詢,忽略長度小于設(shè)定值的文本;
InputIteratorWrapper:遍歷BytesRefIterator并且返回的內(nèi)容不包含payload且weight均為1;
SortedInputIterator:二進(jìn)制類型的輸入輪詢且按照指定的comparator算法進(jìn)行排序;
3.既然指定了數(shù)據(jù)源,下一步就是如何建立suggest索引
RAMDirectory indexDir = newRAMDirectory();
StandardAnalyzer analyzer= newStandardAnalyzer();
AnalyzingInfixSuggester suggester= newAnalyzingInfixSuggester(indexDir, analyzer);//創(chuàng)建索引,根據(jù)InputIterator的具體實(shí)現(xiàn)決定數(shù)據(jù)源以及創(chuàng)建索引的規(guī)則
suggester.build(new InputIterator{});
4.索引建立完畢即可在索引上進(jìn)行查詢,輸入模糊的字符,Lucene suggest的內(nèi)部算法會根據(jù)索引的建立規(guī)則提出suggest查詢的內(nèi)容。
private static voidlookup(AnalyzingInfixSuggester suggester, String name,
String region)throwsIOException {
HashSet contexts = new HashSet();//使用Contexts域?qū)uggest結(jié)果進(jìn)行過濾
contexts.add(new BytesRef(region.getBytes("UTF8")));//num決定了返回幾條數(shù)據(jù),參數(shù)四表明是否所有TermQuery是否都需要滿足,參數(shù)五表明是否需要高亮顯示
List results = suggester.lookup(name, contexts, 2, true, false);
System.out.println("-- \"" + name + "\" (" + region + "):");for(Lookup.LookupResult result : results) {
System.out.println(result.key);//result.key中存儲的是根據(jù)用戶輸入內(nèi)部算法進(jìn)行匹配后返回的suggest內(nèi)容
}
5.下面提供一個(gè)實(shí)例說明完整的suggest索引創(chuàng)建,查詢過程
實(shí)體類
packagecom.cfh.study.lucence_test6;importjava.io.Serializable;/*** @Author: cfh
* @Date: 2018/9/17 10:18
* @Description: 用來測試suggest功能的pojo類*/
public class Product implementsSerializable {/**產(chǎn)品名稱*/
privateString name;/**產(chǎn)品圖片*/
privateString image;/**產(chǎn)品銷售地區(qū)*/
privateString[] regions;/**產(chǎn)品銷售量*/
private intnumberSold;publicProduct() {
}public Product(String name, String image, String[] regions, intnumberSold) {this.name =name;this.image =image;this.regions =regions;this.numberSold =numberSold;
}publicString getName() {returnname;
}public voidsetName(String name) {this.name =name;
}publicString getImage() {returnimage;
}public voidsetImage(String image) {this.image =image;
}publicString[] getRegions() {returnregions;
}public voidsetRegions(String[] regions) {this.regions =regions;
}public intgetNumberSold() {returnnumberSold;
}public void setNumberSold(intnumberSold) {this.numberSold =numberSold;
}
}
指定數(shù)據(jù)源,這里的數(shù)據(jù)源是傳入的一個(gè)product集合的迭代器,可以根據(jù)實(shí)際情況更換數(shù)據(jù)源為文件或者數(shù)據(jù)庫等。
packagecom.cfh.study.lucence_test6;importorg.apache.lucene.search.suggest.InputIterator;importorg.apache.lucene.util.BytesRef;importjava.io.ByteArrayOutputStream;importjava.io.IOException;importjava.io.ObjectOutputStream;importjava.io.UnsupportedEncodingException;importjava.util.Comparator;importjava.util.HashSet;importjava.util.Iterator;importjava.util.Set;/*** @Author: cfh
* @Date: 2018/9/17 10:21
* @Description: 這個(gè)類是核心,決定了你的索引是如何創(chuàng)建的,決定了最終返回的提示關(guān)鍵詞列表數(shù)據(jù)及其排序*/
public class ProductIterator implementsInputIterator {private IteratorproductIterator;privateProduct currentProduct;
ProductIterator(IteratorproductIterator) {this.productIterator =productIterator;
}/*** 設(shè)置是否啟用Contexts域
*@return
*/
public booleanhasContexts() {return true;
}/*** 是否有設(shè)置payload信息*/
public booleanhasPayloads() {return true;
}public ComparatorgetComparator() {return null;
}/*** next方法的返回值指定的其實(shí)就是就是可能返回給我們的suggest的值的結(jié)果集合(LookUpResult.key),這里我們選擇了商品名。*/
publicBytesRef next() {if(productIterator.hasNext()) {
currentProduct=productIterator.next();try{//返回當(dāng)前Project的name值,把product類的name屬性值作為key
return new BytesRef(currentProduct.getName().getBytes("UTF8"));
}catch(UnsupportedEncodingException e) {throw new RuntimeException("Couldn't convert to UTF-8",e);
}
}else{return null;
}
}/*** 將Product對象序列化存入payload
* [這里僅僅是個(gè)示例,其實(shí)這種做法不可取,一般不會把整個(gè)對象存入payload,這樣索引體積會很大,浪費(fèi)硬盤空間]*/
publicBytesRef payload() {try{
ByteArrayOutputStream bos= newByteArrayOutputStream();
ObjectOutputStream out= newObjectOutputStream(bos);
out.writeObject(currentProduct);
out.close();return newBytesRef(bos.toByteArray());
}catch(IOException e) {throw new RuntimeException("Well that's unfortunate.");
}
}/*** 把產(chǎn)品的銷售區(qū)域存入context,context里可以是任意的自定義數(shù)據(jù),一般用于數(shù)據(jù)過濾
* Set集合里的每一個(gè)元素都會被創(chuàng)建一個(gè)TermQuery,你只是提供一個(gè)Set集合,至于new TermQuery
* Lucene底層API去做了,但你必須要了解底層干了些什么*/
public Setcontexts() {try{
Set regions = new HashSet();for(String region : currentProduct.getRegions()) {
regions.add(new BytesRef(region.getBytes("UTF8")));
}returnregions;
}catch(UnsupportedEncodingException e) {throw new RuntimeException("Couldn't convert to UTF-8");
}
}/*** 返回權(quán)重值,這個(gè)值會影響排序
* 這里以產(chǎn)品的銷售量作為權(quán)重值,weight值即最終返回的熱詞列表里每個(gè)熱詞的權(quán)重值
* 怎么設(shè)計(jì)返回這個(gè)權(quán)重值,發(fā)揮你們的想象力吧*/
public longweight() {returncurrentProduct.getNumberSold();
}
}
最后當(dāng)然是測試suggest的結(jié)果啦,可以看到我們根據(jù)product的name進(jìn)行了suggest并使用product的region域?qū)uggest結(jié)果進(jìn)行了過濾
private static voidlookup(AnalyzingInfixSuggester suggester, String name,
String region)throwsIOException {
HashSet contexts = new HashSet();//先根據(jù)region域進(jìn)行suggest再根據(jù)name域進(jìn)行suggest
contexts.add(new BytesRef(region.getBytes("UTF8")));//num決定了返回幾條數(shù)據(jù),參數(shù)四表明是否所有TermQuery是否都需要滿足,參數(shù)五表明是否需要高亮顯示
List results = suggester.lookup(name, contexts, 2, true, false);
System.out.println("-- \"" + name + "\" (" + region + "):");for(Lookup.LookupResult result : results) {
System.out.println(result.key);//result.key中存儲的是根據(jù)用戶輸入內(nèi)部算法進(jìn)行匹配后返回的suggest內(nèi)容//從載荷(payload)中反序列化出Product對象(實(shí)際生產(chǎn)中出于降低內(nèi)存占用考慮一般不會在載荷中存儲這么多內(nèi)容)
BytesRef bytesRef =result.payload;
ObjectInputStream is= new ObjectInputStream(newByteArrayInputStream(bytesRef.bytes));
Product product= null;try{
product=(Product)is.readObject();
}catch(ClassNotFoundException e) {//TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("product-Name:" +product.getName());
System.out.println("product-regions:" +product.getRegions());
System.out.println("product-image:" +product.getImage());
System.out.println("product-numberSold:" +product.getNumberSold());
}
System.out.println();
}
當(dāng)然也可以參考原博主的github:study-lucene
原文博主:https://blog.csdn.net/m0_37556444/article/details/82734959
總結(jié)
以上是生活随笔為你收集整理的java suggest_lucene的suggest(搜索提示功能的实现)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java毕业设计_火锅店点餐系统
- 下一篇: 我的黑莓8830插CDMA卡成功