Lucene4.3.1 拼写检查SpellChecker
2019獨角獸企業(yè)重金招聘Python工程師標準>>>
org.apache.lucene.search.spell?
Class SpellChecker
java.lang.Object?org.apache.lucene.search.spell.SpellChecker
Lucene拼寫檢查類
使用例子:
?SpellChecker?spellchecker?=?new?SpellChecker(spellIndexDirectory);//?To?index?a?field?of?a?user?index:spellchecker.indexDictionary(new?LuceneDictionary(my_lucene_reader,?a_field));//?To?index?a?file?containing?words:spellchecker.indexDictionary(new?PlainTextDictionary(new?File("myfile.txt")));String[]?suggestions?=?spellchecker.suggestSimilar("misspelt",?5);SpellChecker有三個構造方法,可以根據(jù)給定的Directory實例創(chuàng)建SpellChecker對象進行后續(xù)操作;
PlainTextDictionary實現(xiàn)了Dictionary接口,并提供3個構造方法,參數(shù)分別為:File、InputStream、Reader
上面例子中根據(jù)一個文本文件創(chuàng)建PlainTextDirectory字典,該文本文件的格式為每一行包含一個詞,如:
word1 word2 word3其他:FileDictionary,?HighFrequencyDictionary,?LuceneDictionary
SpellChecker方法:
String [] suggestSimilar(String word,int numSug)
參數(shù):
word-需要檢查的詞
numSug-返回的suggest詞數(shù)
其他的:String [] suggestSimilar(...),可以根據(jù)精度等進行,詳情請參考官方文檔;
完整代碼示例:
import?org.apache.lucene.document.Document; import?org.apache.lucene.document.Field; import?org.apache.lucene.document.TextField; import?org.apache.lucene.index.DirectoryReader; import?org.apache.lucene.index.IndexReader; import?org.apache.lucene.index.IndexWriter; import?org.apache.lucene.index.IndexWriterConfig; import?org.apache.lucene.queryparser.classic.QueryParser; import?org.apache.lucene.search.IndexSearcher; import?org.apache.lucene.search.Query; import?org.apache.lucene.search.ScoreDoc; import?org.apache.lucene.search.TopDocs; import?org.apache.lucene.search.spell.PlainTextDictionary; import?org.apache.lucene.search.spell.SpellChecker; import?org.apache.lucene.store.Directory; import?org.apache.lucene.store.RAMDirectory; import?org.apache.lucene.util.Version; import?org.wltea.analyzer.lucene.IKAnalyzer;import?java.io.File; import?java.io.IOException; import?java.util.ArrayList; import?java.util.List;public?class?SpellCheckerTest?{private?static?String?filepath?=?"C:\\Users\\Mr_Tank_\\Desktop\\BaseTest\\dictionaryfile.txt";private?Document?document;private?Directory?directory;private?IndexWriter?indexWriter;private?SpellChecker?spellchecker;private?IndexReader?indexReader;private?IndexSearcher?indexSearcher;private?IndexWriterConfig?getConfig()?{return?new?IndexWriterConfig(Version.LUCENE_43,?new?IKAnalyzer(true));}private?IndexWriter?getIndexWriter()?{directory?=?new?RAMDirectory();try?{return?new?IndexWriter(directory,?getConfig());}?catch?(IOException?e)?{e.printStackTrace();return?null;}}/***?Create?index?for?test**?@param?content*?@throws?IOException*/public?void?createIndex(String?content)?{indexWriter?=?getIndexWriter();document?=?new?Document();document.add(new?TextField("content",?content,?Field.Store.YES));try?{indexWriter.addDocument(document);indexWriter.commit();indexWriter.close();}?catch?(IOException?e)?{e.printStackTrace();}}public?ScoreDoc[]?gethits(String?content)?{try?{indexReader?=?DirectoryReader.open(directory);indexSearcher?=?new?IndexSearcher(indexReader);QueryParser?parser?=?new?QueryParser(Version.LUCENE_43,?"content",?new?IKAnalyzer(true));Query?query?=?parser.parse(content);TopDocs?td?=?indexSearcher.search(query,?1000);return?td.scoreDocs;}?catch?(Exception?e)?{e.printStackTrace();return?null;}}/***?@param?scoreDocs*?@return*?@throws?IOException*/public?List<Document>?getDocumentList(ScoreDoc[]?scoreDocs)?throws?IOException?{List<Document>?documentList?=?null;if?(scoreDocs.length?>=?1)?{documentList?=?new?ArrayList<Document>();for?(int?i?=?0;?i?<?scoreDocs.length;?i++)?{documentList.add(indexSearcher.doc(scoreDocs[i].doc));}}return?documentList;}public?String[]?search(String?word,?int?numSug)?{directory?=?new?RAMDirectory();try?{spellchecker?=?new?SpellChecker(directory);spellchecker.indexDictionary(new?PlainTextDictionary(new?File(filepath)),?getConfig(),?true);return?getSuggestions(spellchecker,?word,?numSug);}?catch?(IOException?e)?{e.printStackTrace();return?null;}}private?String[]?getSuggestions(SpellChecker?spellchecker,?String?word,?int?numSug)?throws?IOException?{return?spellchecker.suggestSimilar(word,?numSug);}public?static?void?main(String[]?args)?throws?IOException?{SpellCheckerTest?spellCheckerTest?=?new?SpellCheckerTest();spellCheckerTest.createIndex("開源中國-找到您想要的開源項目,分享和交流");spellCheckerTest.createIndex("CSDN-全球最大中文IT社區(qū)");String?word?=?"開園中國";/*ScoreDoc[]?scoreDocs?=?spellCheckerTest.gethits(word);List<Document>?documentList?=?spellCheckerTest.getDocumentList(scoreDocs);if?(documentList.size()?>=?1)?{for?(Document?d?:?documentList)?{System.out.println("搜索結果:"?+?d.get("content"));}}*/String[]?suggest?=?spellCheckerTest.search(word,?5);if?(suggest?!=?null?&&?suggest.length?>=?1)?{for?(String?s?:?suggest)?{System.out.println("您是不是要找:"?+?s);}}?else?{System.out.println("拼寫正確");}} }dictionaryfile.txt:
中華人民共和國 開源中國 開源社區(qū) Lucene 拼寫檢查 Lucene4.3.1轉載于:https://my.oschina.net/tanweijie/blog/194046
總結
以上是生活随笔為你收集整理的Lucene4.3.1 拼写检查SpellChecker的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: C++11 中值得关注的几大变化
- 下一篇: 【原创翻译】习题