白话Elasticsearch06- 深度探秘搜索技术之手动控制全文检索结果的精准度
文章目錄
- 概述
- 數(shù)據(jù)
- 小例子
- 搜索標(biāo)題中包含java或elasticsearch的blog
- 搜索標(biāo)題中包含java和elasticsearch的blog
- 搜索包含java,elasticsearch,spark,hadoop,4個(gè)關(guān)鍵字中,至少3個(gè)的blog
- 用bool組合多個(gè)搜索條件,來(lái)搜索title
- bool組合多個(gè)搜索條件,如何計(jì)算relevance score
- 搜索java,hadoop,spark,elasticsearch,至少包含其中3個(gè)關(guān)鍵字
概述
繼續(xù)跟中華石杉老師學(xué)習(xí)ES,第六篇
課程地址: https://www.roncoo.com/view/55
如果我們要想對(duì)全文檢索的方式實(shí)現(xiàn)更細(xì)粒度的控制該怎么辦呢? 這里我們就來(lái)探討下手動(dòng)控制全文檢索結(jié)果的精準(zhǔn)度的幾種方式
match query
6.4版本 :
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/query-dsl-match-query.html
7.0
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-match-query.html
數(shù)據(jù)
為了說(shuō)明該部分,我們給帖子數(shù)據(jù)增加標(biāo)題title字段
POST /forum/article/_bulk {"update":{"_id":"1"}} {"doc":{"title":"this is java and elasticsearch blog"}} {"update":{"_id":"2"}} {"doc":{"title":"this is java blog"}} {"update":{"_id":"3"}} {"doc":{"title":"this is elasticsearch blog"}} {"update":{"_id":"4"}} {"doc":{"title":"this is java, elasticsearch, hadoop blog"}} {"update":{"_id":"5"}} {"doc":{"title":"this is spark blog"}}看下其中一條數(shù)據(jù)檢查下title字段
mapping :
小例子
搜索標(biāo)題中包含java或elasticsearch的blog
重點(diǎn)是: 或
The match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text
這個(gè),就跟之前的那個(gè)term query,不一樣了。不是搜索exact value,是進(jìn)行full text全文檢索。
match query,是負(fù)責(zé)進(jìn)行全文檢索的。當(dāng)然,如果要檢索的field,是 not_analyzed類型的,或者是keyword類型,那么match query也相當(dāng)于term query。
title的字段映射為
我們先看下 “this is java and elasticsearch blog” 的分詞
被拆分成了 this 、 is 、java 、 and 、 elasticsearch 、 blog 存放在倒排索引中
我們要 搜索標(biāo)題中包含java或elasticsearch的blog ,改如何做呢?
看看 java elasticsearch 的分詞
GET /forum/_analyze {"field": "title","text": "java elasticsearch" }所以,這個(gè)只要match query即可
GET /forum/_search {"query": {"match": {"title": "java elasticsearch"}} }返回4條數(shù)據(jù) ,符合 或
{"took": 5,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 4,"max_score": 0.8092568,"hits": [{"_index": "forum","_type": "article","_id": "4","_score": 0.8092568,"_source": {"articleID": "QQPX-R-3956-#aD8","userID": 2,"hidden": true,"postDate": "2017-01-02","tag": ["java","elasticsearch"],"tag_cnt": 2,"view_cnt": 80,"title": "this is java, elasticsearch, hadoop blog"}},{"_index": "forum","_type": "article","_id": "1","_score": 0.5753642,"_source": {"articleID": "XHDK-A-1293-#fJ3","userID": 1,"hidden": false,"postDate": "2017-01-01","tag": ["java","hadoop"],"tag_cnt": 2,"view_cnt": 30,"title": "this is java and elasticsearch blog"}},{"_index": "forum","_type": "article","_id": "3","_score": 0.2876821,"_source": {"articleID": "JODL-X-1937-#pV7","userID": 2,"hidden": false,"postDate": "2017-01-01","tag": ["hadoop"],"tag_cnt": 1,"view_cnt": 100,"title": "this is elasticsearch blog"}},{"_index": "forum","_type": "article","_id": "2","_score": 0.19856805,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "2017-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog"}}]} }搜索標(biāo)題中包含java和elasticsearch的blog
重點(diǎn)是: 和
The operator flag can be set to or or and to control the boolean clauses (defaults to or).
如果你希望所有的搜索關(guān)鍵字都要匹配的,那么就用and,可以實(shí)現(xiàn)單純match query無(wú)法實(shí)現(xiàn)的效果
GET /forum/_search {"query": {"match": {"title": {"query": "java elasticsearch","operator": "and"}}} }返回2條數(shù)據(jù) ,OK
搜索包含java,elasticsearch,spark,hadoop,4個(gè)關(guān)鍵字中,至少3個(gè)的blog
指定一些關(guān)鍵字中,必須至少匹配其中的多少個(gè)關(guān)鍵字,才能作為結(jié)果返回
The minimum number of optional should clauses to match can be set using the minimum_should_match parameter.
minimum_should_match 說(shuō)明
https://www.elastic.co/guide/en/elasticsearch/reference/7.0/query-dsl-minimum-should-match.html
百分比
GET /forum/_search {"query": {"match": {"title": {"query": "java elasticsearch spark hadoop","minimum_should_match": "75%"}}} }數(shù)字
GET /forum/_search {"query": {"match": {"title": {"query": "java elasticsearch spark hadoop","minimum_should_match": 3}}} }返回一條數(shù)據(jù) ,符合了至少3個(gè)
用bool組合多個(gè)搜索條件,來(lái)搜索title
GET /forum/article/_search {"query": {"bool": {"must": {"match": {"title": "java"}},"must_not": {"match": {"title": "spark"}},"should": [{"match": {"title": "hadoop"}},{"match": {"title": "elasticsearch"}}]}} }match在匹配時(shí)會(huì)對(duì)所查找的關(guān)鍵詞進(jìn)行分詞,然后按分詞匹配查找.
term會(huì)直接對(duì)關(guān)鍵詞進(jìn)行查找。一般模糊查找的時(shí)候,多用match,而精確查找時(shí)可以使用term.
也可以使用term精確查找
GET /forum/_search {"query": {"bool": {"must": {"term": {"title": "java"}},"must_not": {"term": {"title": "spark"}},"should": [{"term": {"title": "hadoop"}},{"term": {"title": "elasticsearch"}}]}} }bool組合多個(gè)搜索條件,如何計(jì)算relevance score
must和should搜索對(duì)應(yīng)的分?jǐn)?shù),加起來(lái),除以must和should的總數(shù)
- 排名第一:java,同時(shí)包含should中所有的關(guān)鍵字,hadoop,elasticsearch
- 排名第二:java,同時(shí)包含should中的elasticsearch
- 排名第三:java,不包含should中的任何關(guān)鍵字
should是可以影響相關(guān)度分?jǐn)?shù)的
must是確保說(shuō),誰(shuí)必須有這個(gè)關(guān)鍵字,同時(shí)會(huì)根據(jù)這個(gè)must的條件去計(jì)算出document對(duì)這個(gè)搜索條件的relevance score
在滿足must的基礎(chǔ)之上,should中的條件,不匹配也可以,但是如果匹配的更多,那么document的relevance score就會(huì)更高
搜索java,hadoop,spark,elasticsearch,至少包含其中3個(gè)關(guān)鍵字
默認(rèn)情況下,should是可以不匹配任何一個(gè)的,比如上面的搜索中,this is java blog,就不匹配任何一個(gè)should條件
但是有個(gè)例外的情況,如果沒(méi)有must的話,那么should中必須至少匹配一個(gè)才可以.
比如下面的搜索,should中有4個(gè)條件,默認(rèn)情況下,只要滿足其中一個(gè)條件,就可以匹配作為結(jié)果返回, 但是可以精準(zhǔn)控制,should的4個(gè)條件中,至少匹配幾個(gè)才能作為結(jié)果返回
GET /forum/article/_search {"query": {"bool": {"should": [{"match": {"title": "java"}},{"match": {"title": "elasticsearch"}},{"match": {"title": "hadoop"}},{"match": {"title": "spark"}}],"minimum_should_match": 3}} }總結(jié)一下
- 1、全文檢索的時(shí)候,進(jìn)行多個(gè)值的檢索,有兩種做法,match query;should
- 2、控制搜索結(jié)果精準(zhǔn)度:and operator、minimum_should_match
總結(jié)
以上是生活随笔為你收集整理的白话Elasticsearch06- 深度探秘搜索技术之手动控制全文检索结果的精准度的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 白话Elasticsearch05- 结
- 下一篇: 白话Elasticsearch07- 深