Elasticsearch入门之从零开始安装ik分词器
起因
需要在ES中使用聚合進行統計分析,但是聚合字段值為中文,ES的默認分詞器對于中文支持非常不友好:會把完整的中文詞語拆分為一系列獨立的漢字進行聚合,顯然這并不是我的初衷。我們來看個實例:
POST http://192.168.80.133:9200/my_index_name/my_type_name/_search {"size": 0,"query" : {"range" : {"time": {"gte": 1513778040000,"lte": 1513848720000}}},"aggs": {"keywords": {"terms": {"field": "keywords"},"aggs": {"emotions": {"terms": {"field": "emotion"}}}} } }輸出結果:
{"took": 22,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 32,"max_score": 0.0,"hits": []},"aggregations": {"keywords": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "力", # 完整的詞被拆分為獨立的漢字"doc_count": 2,"emotions": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": -1,"doc_count": 1},{"key": 0,"doc_count": 1}]}},{"key": "動","doc_count": 2,"emotions": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": -1,"doc_count": 1},{"key": 0,"doc_count": 1}]}}]}} }既然ES的默認分詞器對于中文支持非常不友好,那么有沒有可以支持中文的分詞器呢?如果有,該如何使用呢?
第一個問題,萬能的谷歌告訴了我結果,已經有了支持中文的分詞器,而且是開源實現:IK Analysis for Elasticsearch,詳見:https://github.com/medcl/elasticsearch-analysis-ik。
秉著“拿來主義”不重復造輪子的指導思想,直接先拿過來使用一下,看看效果怎么樣。那么,如何使用IK分詞器呢?其實這是一個ES插件,直接安裝并對ES進行相應的配置即可。
安裝IK分詞器
我的ES版本為2.4.1,需要下載的IK版本為:1.10.1(注意:必須下載與ES版本對應的IK,否則不能使用)。
1.下載,編譯IK
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v1.10.1/elasticsearch-analysis-ik-1.10.1.zip unzip elasticsearch-analysis-ik-1.10.1.zip cd elasticsearch-analysis-ik-1.10.1 mvn clean package在elasticsearch-analysis-ik-1.10.1\target\releases目錄下生成打包文件:elasticsearch-analysis-ik-1.10.1.zip。
2.在ES中安裝IK插件
將上述打包好的IK插件:elasticsearch-analysis-ik-1.10.1.zip拷貝到ES/plugins目錄下,執行解壓。
unzip elasticsearch-analysis-ik-1.10.1.zip rm -rf elasticsearch-analysis-ik-1.10.1.zip # 解壓完之后一定要刪除這個zip包,否則在啟動ES時報錯重啟ES。
使用IK分詞器
安裝IK分詞器完畢之后,就可以在ES使用了。
第一步:新建index
PUT http://192.168.80.133:9200/my_index_name第二步:給將來要使用的doc字段添加mapping
在這里我在ES中存儲的doc格式如下:
需要在keywords字段上進行聚合分析,所以給keywords字段添加mapping設置:
POST http://192.168.80.133:9200/my_index_name/my_type_name/_mapping {"properties": {"keywords": { # 設置keywords字段使用ik分詞器"type": "string","store": "no","analyzer": "ik_smart","search_analyzer": "ik_smart","boost": 8}} }注意: 在設置mapping時有一個小插曲,我根據IK的官網設置“keywords”的type為“text”時報錯:
POST http://192.168.80.133:9200/my_index_name/my_type_name/_mapping {"properties": {"keywords": {"type": "text", # text類型在2.4.1版本中不支持"store": "no","analyzer": "ik_smart","search_analyzer": "ik_smart","boost": 8}} }報錯:
{"error": {"root_cause": [{"type": "mapper_parsing_exception","reason": "No handler for type [text] declared on field [keywords]"}],"type": "mapper_parsing_exception","reason": "No handler for type [text] declared on field [keywords]"},"status": 400 }這是因為我使用的ES版本比較低:2.4.1,而text類型是ES5.0之后才添加的類型,所以不支持。在ES2.4.1版本中需要使用string類型。
第三步:添加doc對象
POST http://192.168.80.133:9200/my_index_name/my_type_name/ {"nagtive_kw": ["動力","外觀","油耗"]"is_all": false,"emotion": 0,"focuce": false,"keywords": ["動力","外觀","油耗"], // 在keywords字段上進行聚合分析"source": "汽車之家","time": -1,"machine_emotion": 0,"title": "從動次打次吃大餐","spider": "qczj_index","content": {},"url": "http://xxx","brand": "寶馬","series": "寶馬1系","model": "2017款" }第四步:聚合分析
POST http://192.168.80.133:9200/my_index_name/my_type_name/_search {"size": 0,"query" : {"range" : {"time": {"gte": 1513778040000,"lte": 1513848720000}}},"aggs": {"keywords": {"terms": {"field": "keywords"},"aggs": {"emotions": {"terms": {"field": "emotion"}}}} } }輸出結果:
{"took": 22,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 32,"max_score": 0.0,"hits": []},"aggregations": {"keywords": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "動力", # 完整的詞沒有被拆分為獨立的漢字"doc_count": 2,"emotions": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": -1,"doc_count": 1},{"key": 0,"doc_count": 1}]}}]}} }【參考】
http://www.cnblogs.com/xing901022/p/5910139.html 如何在Elasticsearch中安裝中文分詞器(IK+pinyin)
https://elasticsearch.cn/question/47 關于聚合(aggs)的問題
https://github.com/medcl/elasticsearch-analysis-ik/issues/276 create map時出現No handler for type [text] declared on field [content] #276
http://blog.csdn.net/guo_jia_liang/article/details/52980716 Elasticsearch2.4學習(三)------Elasticsearch2.4插件安裝詳解
轉載于:https://www.cnblogs.com/nuccch/p/8207261.html
總結
以上是生活随笔為你收集整理的Elasticsearch入门之从零开始安装ik分词器的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Dos下的小技巧
- 下一篇: BZOJ1951 [Sdoi2010]古