ElasticSearch6.5.0 【安装IK分词器】
生活随笔
收集整理的這篇文章主要介紹了
ElasticSearch6.5.0 【安装IK分词器】
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
不得不夸獎一下ES的周邊資源,比如這個IK分詞器,緊跟ES的版本,盧本偉牛逼!另外ES更新太快了吧,幾乎不到半個月一個小版本就發布了!!目前已經發了6.5.2,估計我還沒怎么玩就到7.0了。
下載
分詞器:GitHub
點擊release,下載對應的版本,他這個跟ES是一一對應的。
安裝
他這個安裝非常容易!業界良心啊!!
第一步:在elasticsearch-6.5.0主目錄下的plugins目錄新建一個ik文件夾
第二步:把從GitHub下載下來的壓縮包解壓到這個文件夾
啟動
?進入ES主目錄
[E:\elasticsearch-6.5.0]$ .\bin\elasticsearch.bat準備數據
依賴:
<dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-core</artifactId><version>2.11.1</version></dependency><dependency><groupId>org.apache.logging.log4j</groupId><artifactId>log4j-to-slf4j</artifactId><version>2.11.1</version></dependency><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-api</artifactId><version>1.7.25</version></dependency><dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-client</artifactId><version>6.5.0</version></dependency><dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-high-level-client</artifactId><version>6.5.0</version></dependency>連接:
package com.demo.dao;import org.apache.http.HttpHost; import org.elasticsearch.client.RestClient; import org.elasticsearch.client.RestHighLevelClient;/*** Java高級REST客戶機在Java低級REST客戶機之上工作。它的主要目標是公開特定于API的方法,這些方法接受請求對象作為參數并返回響應對象* 可以同步或異步調用每個API。同步方法返回一個響應對象,而異步方法(其名稱以async后綴結尾)需要一個偵聽器參數* 一旦接收到響應或錯誤,偵聽器參數(在低層客戶機管理的線程池上)將被通知。* Java高級REST客戶機依賴于Elasticsearch核心項目。它接受與TransportClient相同的請求參數,并返回相同的響應對象。* Java高級REST客戶機需要Java 1.8* 客戶機版本與開發客戶機的Elasticsearch版本相同* 6.0客戶端能夠與任意6.X節點通信,6.1客戶端能夠與6.1、6.2和任意6.X通信*/ public class RestClientFactory {private RestClientFactory(){}private static class Inner{private static final RestClientFactory instance = new RestClientFactory();}public static RestClientFactory getInstance(){return Inner.instance;}public RestHighLevelClient getClient(){RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(//new HttpHost("localhost", 9201, "http"),new HttpHost("localhost", 9200, "http")));return client;}}?
代碼:
/*** 創建索引* @return* @throws IOException*/public static RestHighLevelClient createIndexForIk() throws IOException {RestHighLevelClient client = RestClientFactory.getInstance().getClient();CreateIndexRequest request = new CreateIndexRequest("test_ik_index");request.settings(Settings.builder().put("index.number_of_shards", 1).put("index.number_of_replicas", 1))// 設置mapping.mapping("social", "content1","type=text", "content2", "type=text,analyzer=ik_smart","content3", "type=text,analyzer=ik_max_word")// 創建超時.timeout(TimeValue.timeValueMinutes(2))// 連接到主節點超時時間.masterNodeTimeout(TimeValue.timeValueMinutes(1));CreateIndexResponse indexResponse = client.indices().create(request, RequestOptions.DEFAULT);boolean acknowledged = indexResponse.isAcknowledged();boolean shardsAcknowledged = indexResponse.isShardsAcknowledged();System.out.println(acknowledged + "," + shardsAcknowledged);return client;}/*** 準備數據* @return* @throws IOException*/public static RestHighLevelClient bulkAddForIk() throws IOException {RestHighLevelClient client = RestClientFactory.getInstance().getClient();BulkRequest request = new BulkRequest();request.add(new IndexRequest("test_ik_index", "social", "1").source(XContentType.JSON,"content1", "富強、民主、文明、和諧,自由、平等、公正、法治,愛國、敬業、誠信、友善","content2", "“富強、民主、文明、和諧”,是我國社會主義現代化國家的建設目標,也是從價值目標層面對社會主義核心價值觀基本理念的凝練,在社會主義核心價值觀中居于最高層次,對其他層次的價值理念具有統領作用","content3", "富強、民主、文明、和諧,自由、平等、公正、法治,愛國、敬業、誠信、友善"));request.add(new IndexRequest("test_ik_index", "social", "2").source(XContentType.JSON,"content1", "以熱愛祖國為榮,以危害祖國為恥","content2", "1978年12月,黨的十一屆三中全會重新恢復和確立了實事求是的思想路線,堅持把馬克思主義與改革開放和我國社會主義建設偉大實踐相結合,科學繼承了***思想,創立了鄧小平理論、“三個代表”重要思想、科學發展觀等馬克思主義中國化最新成果,馬克思主義在意識形態領域的指導地位不斷鞏固","content3", "“自由、平等、公正、法治”,是對美好社會的生動表述,也是從社會層面對社會主義核心價值觀基本理念的凝練"));request.add(new IndexRequest("test_ik_index", "social", "3").source(XContentType.JSON,"content1", "以服務人民為榮,以背離人民為恥","content2", "新中國的建立,確立了以社會主義基本政治制度、基本經濟制度的確立和以馬克思主義為指導思想的社會主義意識形態,為社會主義核心價值體系建設奠定了政治前提、物質基礎和文化條件","content3", "“愛國、敬業、誠信、友善”,是公民基本道德規范,是從個人行為層面對社會主義核心價值觀基本理念的凝練"));BulkResponse bulk = client.bulk(request, RequestOptions.DEFAULT);System.out.println("Status:" + bulk.status().name() + ",hasFailures:" + bulk.hasFailures());MultiGetRequest multiGetRequest = new MultiGetRequest().add(new MultiGetRequest.Item("test_ik_index", "social", "1")).add(new MultiGetRequest.Item("test_ik_index", "social", "2")).add(new MultiGetRequest.Item("test_ik_index", "social", "3"));MultiGetResponse response = client.mget(multiGetRequest, RequestOptions.DEFAULT);MultiGetItemResponse[] itemResponses = response.getResponses();for(MultiGetItemResponse r : itemResponses){System.out.println(r.getResponse().getSourceAsString());}return client;}執行
public static void main(String[] args) throws IOException, ExecutionException, InterruptedException {createIndexForIk().close();bulkAddForIk().close();}我有三個字段:content1--用的默認分詞器;content2:用的ik_smart;content3:用的ik_max_word
測試(在Kibana控制臺里)
?第一個查詢:(可見默認的沒有正確分詞,看highlight字段)
GET /test_ik_index/_search {"query" : {"match": { "content1": "中國" }},"highlight" : {"pre_tags" : ["<tag1>"],"post_tags" : ["</tag1>"],"fields" : {"content1": {}}} }------------------------------- {"took" : 3,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : 2,"max_score" : 0.68320733,"hits" : [{"_index" : "test_ik_index","_type" : "social","_id" : "2","_score" : 0.68320733,"_source" : {"content1" : "以熱愛祖國為榮,以危害祖國為恥","content2" : "1978年12月,黨的十一屆三中全會重新恢復和確立了實事求是的思想路線,堅持把馬克思主義與改革開放和我國社會主義建設偉大實踐相結合,科學繼承了***思想,創立了鄧小平理論、“三個代表”重要思想、科學發展觀等馬克思主義中國化最新成果,馬克思主義在意識形態領域的指導地位不斷鞏固","content3" : "“自由、平等、公正、法治”,是對美好社會的生動表述,也是從社會層面對社會主義核心價值觀基本理念的凝練"},"highlight" : {"content1" : ["以熱愛祖<tag1>國</tag1>為榮,以危害祖<tag1>國</tag1>為恥"]}},{"_index" : "test_ik_index","_type" : "social","_id" : "1","_score" : 0.40610588,"_source" : {"content1" : "富強、民主、文明、和諧,自由、平等、公正、法治,愛國、敬業、誠信、友善","content2" : "“富強、民主、文明、和諧”,是我國社會主義現代化國家的建設目標,也是從價值目標層面對社會主義核心價值觀基本理念的凝練,在社會主義核心價值觀中居于最高層次,對其他層次的價值理念具有統領作用","content3" : "富強、民主、文明、和諧,自由、平等、公正、法治,愛國、敬業、誠信、友善"},"highlight" : {"content1" : ["富強、民主、文明、和諧,自由、平等、公正、法治,愛<tag1>國</tag1>、敬業、誠信、友善"]}}]} }第二個:(ok)
GET /test_ik_index/_search {"query" : {"match": { "content2": "馬克思主義" }},"highlight" : {"pre_tags" : ["<tag1>"],"post_tags" : ["</tag1>"],"fields" : {"content2": {}}} }------------------------------- {"took" : 6,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : 2,"max_score" : 0.71390307,"hits" : [{"_index" : "test_ik_index","_type" : "social","_id" : "2","_score" : 0.71390307,"_source" : {"content1" : "以熱愛祖國為榮,以危害祖國為恥","content2" : "1978年12月,黨的十一屆三中全會重新恢復和確立了實事求是的思想路線,堅持把馬克思主義與改革開放和我國社會主義建設偉大實踐相結合,科學繼承了***思想,創立了鄧小平理論、“三個代表”重要思想、科學發展觀等馬克思主義中國化最新成果,馬克思主義在意識形態領域的指導地位不斷鞏固","content3" : "“自由、平等、公正、法治”,是對美好社會的生動表述,也是從社會層面對社會主義核心價值觀基本理念的凝練"},"highlight" : {"content2" : ["1978年12月,黨的十一屆三中全會重新恢復和確立了實事求是的思想路線,堅持把<tag1>馬克思主義</tag1>與改革開放和我國社會主義建設偉大實踐相結合,科學繼承了***思想,創立了鄧小平理論、“三個代表”重要思想、科學發展觀等<tag1>馬克思主義</tag1>中國化最新成果",",<tag1>馬克思主義</tag1>在意識形態領域的指導地位不斷鞏固"]}},{"_index" : "test_ik_index","_type" : "social","_id" : "3","_score" : 0.50678647,"_source" : {"content1" : "以服務人民為榮,以背離人民為恥","content2" : "新中國的建立,確立了以社會主義基本政治制度、基本經濟制度的確立和以馬克思主義為指導思想的社會主義意識形態,為社會主義核心價值體系建設奠定了政治前提、物質基礎和文化條件","content3" : "“愛國、敬業、誠信、友善”,是公民基本道德規范,是從個人行為層面對社會主義核心價值觀基本理念的凝練"},"highlight" : {"content2" : ["新中國的建立,確立了以社會主義基本政治制度、基本經濟制度的確立和以<tag1>馬克思主義</tag1>為指導思想的社會主義意識形態,為社會主義核心價值體系建設奠定了政治前提、物質基礎和文化條件"]}}]} }第三個:(ok)
GET /test_ik_index/_search {"query" : {"match": { "content3": "富強" }},"highlight" : {"pre_tags" : ["<tag1>"],"post_tags" : ["</tag1>"],"fields" : {"content3" : {}}} }----------------------------------- {"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : 1,"max_score" : 1.2146692,"hits" : [{"_index" : "test_ik_index","_type" : "social","_id" : "1","_score" : 1.2146692,"_source" : {"content1" : "富強、民主、文明、和諧,自由、平等、公正、法治,愛國、敬業、誠信、友善","content2" : "“富強、民主、文明、和諧”,是我國社會主義現代化國家的建設目標,也是從價值目標層面對社會主義核心價值觀基本理念的凝練,在社會主義核心價值觀中居于最高層次,對其他層次的價值理念具有統領作用","content3" : "富強、民主、文明、和諧,自由、平等、公正、法治,愛國、敬業、誠信、友善"},"highlight" : {"content3" : ["<tag1>富強</tag1>、民主、文明、和諧,自由、平等、公正、法治,愛國、敬業、誠信、友善"]}}]} }?你也可以單獨驗證分詞器
GET test_ik_index/_analyze {"analyzer": "ik_max_word", "text": "中央高度重視培育和踐行社會主義核心價值觀" }----------------------- {"tokens" : [{"token" : "中央","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "高度重視","start_offset" : 2,"end_offset" : 6,"type" : "CN_WORD","position" : 1},{"token" : "高度","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 2},{"token" : "重視","start_offset" : 4,"end_offset" : 6,"type" : "CN_WORD","position" : 3},{"token" : "培育","start_offset" : 6,"end_offset" : 8,"type" : "CN_WORD","position" : 4},{"token" : "和","start_offset" : 8,"end_offset" : 9,"type" : "CN_CHAR","position" : 5},{"token" : "踐行","start_offset" : 9,"end_offset" : 11,"type" : "CN_WORD","position" : 6},{"token" : "行社","start_offset" : 10,"end_offset" : 12,"type" : "CN_WORD","position" : 7},{"token" : "社會主義","start_offset" : 11,"end_offset" : 15,"type" : "CN_WORD","position" : 8},{"token" : "社會","start_offset" : 11,"end_offset" : 13,"type" : "CN_WORD","position" : 9},{"token" : "主義","start_offset" : 13,"end_offset" : 15,"type" : "CN_WORD","position" : 10},{"token" : "核心","start_offset" : 15,"end_offset" : 17,"type" : "CN_WORD","position" : 11},{"token" : "價值觀","start_offset" : 17,"end_offset" : 20,"type" : "CN_WORD","position" : 12},{"token" : "價值","start_offset" : 17,"end_offset" : 19,"type" : "CN_WORD","position" : 13},{"token" : "觀","start_offset" : 19,"end_offset" : 20,"type" : "CN_CHAR","position" : 14}] }?還可以通過Java API
public static RestHighLevelClient analyze() throws IOException {RestHighLevelClient client = RestClientFactory.getInstance().getClient();AnalyzeRequest request = new AnalyzeRequest();request.text("高通指控蘋果侵犯其兩項專利", "高通再次將蘋果告上法庭,起訴蘋果拖欠專利費 70 億美元");request.analyzer("ik_smart");AnalyzeResponse response = client.indices().analyze(request, RequestOptions.DEFAULT);List<AnalyzeResponse.AnalyzeToken> tokens = response.getTokens();for(AnalyzeResponse.AnalyzeToken t : tokens){int endOffset = t.getEndOffset();int position = t.getPosition();int positionLength = t.getPositionLength();int startOffset = t.getStartOffset();String term = t.getTerm();String type = t.getType();System.out.println("Start:" + startOffset + ",End:" + endOffset + ",Position:" + position + ",Length:" + positionLength +",Term:" + term + ",Type:" + type);}return client;}結果:
Start:0,End:1,Position:0,Length:1,Term:高,Type:CN_CHAR Start:1,End:2,Position:1,Length:1,Term:通,Type:CN_CHAR Start:2,End:4,Position:2,Length:1,Term:指控,Type:CN_WORD Start:4,End:6,Position:3,Length:1,Term:蘋果,Type:CN_WORD Start:6,End:8,Position:4,Length:1,Term:侵犯,Type:CN_WORD Start:8,End:9,Position:5,Length:1,Term:其,Type:CN_CHAR Start:9,End:11,Position:6,Length:1,Term:兩項,Type:CN_WORD Start:11,End:13,Position:7,Length:1,Term:專利,Type:CN_WORD Start:14,End:15,Position:8,Length:1,Term:高,Type:CN_CHAR Start:15,End:16,Position:9,Length:1,Term:通,Type:CN_CHAR Start:16,End:18,Position:10,Length:1,Term:再次,Type:CN_WORD Start:18,End:19,Position:11,Length:1,Term:將,Type:CN_CHAR Start:19,End:21,Position:12,Length:1,Term:蘋果,Type:CN_WORD Start:21,End:22,Position:13,Length:1,Term:告,Type:CN_CHAR Start:22,End:23,Position:14,Length:1,Term:上,Type:CN_CHAR Start:23,End:25,Position:15,Length:1,Term:法庭,Type:CN_WORD Start:26,End:28,Position:16,Length:1,Term:起訴,Type:CN_WORD Start:28,End:30,Position:17,Length:1,Term:蘋果,Type:CN_WORD Start:30,End:32,Position:18,Length:1,Term:拖欠,Type:CN_WORD Start:32,End:35,Position:19,Length:1,Term:專利費,Type:CN_WORD Start:36,End:38,Position:20,Length:1,Term:70,Type:ARABIC Start:39,End:40,Position:21,Length:1,Term:億,Type:TYPE_CNUM Start:40,End:42,Position:22,Length:1,Term:美元,Type:CN_WORD?
?到此為止,分詞器安裝完畢
?
?小插曲:文本有***不讓發布。。。。
?
轉載于:https://www.cnblogs.com/LUA123/p/10098064.html
總結
以上是生活随笔為你收集整理的ElasticSearch6.5.0 【安装IK分词器】的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 从module的简单实现到模块化
- 下一篇: Rust 1.31正式发布,首次引入Ru