白话Elasticsearch18-深度探秘搜索技术之基于slop参数实现近似匹配以及原理剖析
文章目錄
- 概述
- 官網
- slop 含義
- 例子
- 示例一
- 示例二
- 示例三
概述
繼續跟中華石杉老師學習ES,第18篇
課程地址: https://www.roncoo.com/view/55
接上篇博客 白話Elasticsearch17-match_phrase query 短語匹配搜索
官網
https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html
slop 含義
官網中我們可以看到
A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2.slop是什么呢?
query string,搜索文本,中的幾個term,要經過幾次移動才能與一個document匹配,這個移動的次數,就是slop 。
-
slop的phrase match,就是proximity match,近似匹配
-
如果我們指定了slop,那么就允許搜索關鍵詞進行移動,來嘗試與doc進行匹配
-
搜索關鍵詞k,可以有一定的距離,但是靠的越近,越先搜索出來,proximity match
例子
一個query string經過幾次移動之后可以匹配到一個document,然后設置slop .
假設有個doc
hello world, java is very good, spark is also very good.我們使用 match_phrase query 來搜索 java spark ,是肯定搜索不到的, 因為 match_phrase query 會將java spark 作為一個整體來查找。
如果我們指定了slop,那么就允許java spark進行移動,來嘗試與doc進行匹配
這里的slop,就是3,因為java spark這個短語,spark移動了3次,就可以跟一個doc匹配上了 。
slop的含義,不僅僅是說一個query string terms移動幾次,跟一個doc匹配上。一個query string terms,最多可以移動幾次去嘗試跟一個doc匹配上
slop,設置的是3,那么就ok
GET /forum/article/_search {"query": {"match_phrase": {"title": {"query": "java spark","slop": 3}}} }就可以把剛才那個doc匹配上,那個doc會作為結果返回
但是如果slop設置的是2,那么java spark,spark最多只能移動2次,此時跟doc是匹配不上的,那個doc是不會作為結果返回的。
示例一
我們那我們的測試數據來驗證下
GET /forum/article/_search {"query": {"match_phrase": {"content": {"query": "spark data","slop": 3}}} }分析一下slop
data經過了3次移動才匹配到 spark data ,所以 slop設置為3即可,當然了設置成比3大的數字,肯定也是可以查詢到的,這里的slop設置為3 ,可以理解為至少移動3次。
示例二
如果我們搜索data spark 呢? 會不會匹配得到呢? 答案是 : 可以
來分析一下
示例三
slop搜索下,關鍵詞離的越近,relevance score就會越高 .
GET /forum/article/_search {"query": {"match_phrase": {"title": {"query": "java blog","slop": 5}}} }返回結果:
{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": 3,"max_score": 0.81487787,"hits": [{"_index": "forum","_type": "article","_id": "2","_score": 0.81487787,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "2017-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog","content": "i think java is the best programming language","sub_title": "learned a lot of course","author_first_name": "Smith","author_last_name": "Williams","new_author_last_name": "Williams","new_author_first_name": "Smith"}},{"_index": "forum","_type": "article","_id": "1","_score": 0.31424814,"_source": {"articleID": "XHDK-A-1293-#fJ3","userID": 1,"hidden": false,"postDate": "2017-01-01","tag": ["java","hadoop"],"tag_cnt": 2,"view_cnt": 30,"title": "this is java and elasticsearch blog","content": "i like to write best elasticsearch article","sub_title": "learning more courses","author_first_name": "Peter","author_last_name": "Smith","new_author_last_name": "Smith","new_author_first_name": "Peter"}},{"_index": "forum","_type": "article","_id": "4","_score": 0.31424814,"_source": {"articleID": "QQPX-R-3956-#aD8","userID": 2,"hidden": true,"postDate": "2017-01-02","tag": ["java","elasticsearch"],"tag_cnt": 2,"view_cnt": 80,"title": "this is java, elasticsearch, hadoop blog","content": "elasticsearch and hadoop are all very good solution, i am a beginner","sub_title": "both of them are good","author_first_name": "Robbin","author_last_name": "Li","new_author_last_name": "Li","new_author_first_name": "Robbin"}}]} }可以看到
得分最高的
次之
最后
《新程序員》:云原生和全面數字化實踐50位技術專家共同創作,文字、視頻、音頻交互閱讀總結
以上是生活随笔為你收集整理的白话Elasticsearch18-深度探秘搜索技术之基于slop参数实现近似匹配以及原理剖析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 白话Elasticsearch17-深度
- 下一篇: 白话Elasticsearch20-深度