當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Elasticsearch之mapping映射入门

發布時間：2025/3/20 编程问答 19 豆豆

生活随笔收集整理的這篇文章主要介紹了 Elasticsearch之mapping映射入门小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

10.1．什么是mapping映射

概念：自動或手動為index中的_doc建立的一種數據結構和相關配置，簡稱為mapping映射。

插入幾條數據，讓es自動為我們建立一個索引

PUT /website/_doc/1 {"post_date": "2019-01-01","title": "my first article","content": "this is my first article in this website","author_id": 11400 } ? PUT /website/_doc/2 {"post_date": "2019-01-02","title": "my second article","content": "this is my second article in this website","author_id": 11400 }PUT /website/_doc/3 {"post_date": "2019-01-03","title": "my third article","content": "this is my third article in this website","author_id": 11400 }

對比數據庫建表語句

create table website(post_date date,title varchar(50), ? ? content varchar(100),author_id int(11) );

動態映射：dynamic mapping，自動為我們建立index，以及對應的mapping，mapping中包含了每個field對應的數據類型，以及如何分詞等設置。

重點：我們當然，后面會講解，也可以手動在創建數據之前，先創建index，以及對應的mapping

GET /website/_mapping {"website" : {"mappings" : {"properties" : {"author_id" : {"type" : "long"},"content" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"post_date" : {"type" : "date"},"title" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}}}}} }

嘗試各種搜索

GET /website/_search?q=2019 ? ? ? 0條結果 ? ? ? ? ? ? GET /website/_search?q=2019-01-01 ? ? ? ? ? 1條結果 GET /website/_search?q=post_date:2019-01-01 ? ? 1條結果 GET /website/_search?q=post_date:2019 ? ? ? ? 0 條結果

搜索結果為什么不一致，因為es自動建立mapping的時候，設置了不同的field不同的data type。不同的data type的分詞、搜索等行為是不一樣的。所以出現了_all field和post_date field的搜索表現完全不一樣。

10.2．精確匹配與全文搜索的對比分析

10.2.1 exact value 精確匹配

2019-01-01，exact value，搜索的時候，必須輸入2019-01-01，才能搜索出來

如果你輸入一個01，是搜索不出來的

select * from book where name= 'java'

10.2.2 full text 全文檢索

搜“筆記電腦”，筆記本電腦詞條會不會出現。

select * from book where name like '%java%'

（1）縮寫 vs. 全稱：cn vs. china

（2）格式轉化：like liked likes

（3）大小寫：Tom vs tom

（4）同義詞：like vs love

2019-01-01，2019 01 01，搜索2019，或者01，都可以搜索出來

china，搜索cn，也可以將china搜索出來

likes，搜索like，也可以將likes搜索出來

Tom，搜索tom，也可以將Tom搜索出來

like，搜索love，同義詞，也可以將like搜索出來

就不是說單純的只是匹配完整的一個值，而是可以對值進行拆分詞語后（分詞）進行匹配，也可以通過縮寫、時態、大小寫、同義詞等進行匹配。深入 NPL,自然語義處理。

10.3．全文檢索下倒排索引核心原理快速揭秘

doc1：I really liked my small dogs, and I think my mom also liked them.

doc2：He never liked any dogs, so I hope that my mom will not expect me to liked him.

分詞，初步的倒排索引的建立

termdoc1doc2

I	*	*
really	*	?
liked	*	*
my	*	*
small	*	?
dogs	*	?
and	*	?
think	*	?
mom	*	*
also	*	?
them	*	?
He	?	*
never	?	*
any	?	*
so	?	*
hope	?	*
that	?	*
will	?	*
not	?	*
expect	?	*
me	?	*
to	?	*
him	?	*

演示了一下倒排索引最簡單的建立的一個過程

搜索

mother like little dog，不可能有任何結果

mother

little

dog

這不是我們想要的結果。同義詞mom\mother在我們人類看來是一樣。想進行標準化操作。

重建倒排索引

normalization正規化，建立倒排索引的時候，會執行一個操作，也就是說對拆分出的各個單詞進行相應的處理，以提升后面搜索的時候能夠搜索到相關聯的文檔的概率

時態的轉換，單復數的轉換，同義詞的轉換，大小寫的轉換

mom ―> mother

liked ―> like

small ―> little

dogs ―> dog

重新建立倒排索引，加入normalization，再次用mother liked little dog搜索，就可以搜索到了

worddoc1doc2normalization

I	*	*	?
really	*	?	?
like	*	*	liked ―> like
my	*	*	?
little	*	?	small ―> little
dog	*	?	dogs ―> dog
and	*	?	?
think	*	?	?
mother	*	*	mom ―> mother
also	*	?	?
them	*	?	?
He	?	*	?
never	?	*	?
any	?	*	?
so	?	*	?
hope	?	*	?
that	?	*	?
will	?	*	?
not	?	*	?
expect	?	*	?
me	?	*	?
to	?	*	?
him	?	*	?

重新搜索

搜索：mother liked little dog，

對搜索條件經行分詞 normalization

mother

liked -》like

little

dog

doc1和doc2都會搜索出來

10.4. 分詞器 analyzer

10.4.1什么是分詞器 analyzer

作用：切分詞語，normalization（提升recall召回率）

給你一段句子，然后將這段句子拆分成一個一個的單個的單詞，同時對每個單詞進行normalization（時態轉換，單復數轉換）

recall，召回率：搜索的時候，增加能夠搜索到的結果的數量

analyzer 組成部分：

1、character filter：在一段文本進行分詞之前，先進行預處理，比如說最常見的就是，過濾html標簽（<span>hello<span> --> hello），& --> and（I&you --> I and you）

2、tokenizer：分詞，hello you and me --> hello, you, and, me

3、token filter：lowercase，stop word，synonymom，dogs --> dog，liked --> like，Tom --> tom，a/the/an --> 干掉，mother --> mom，small --> little

stop word 停用詞：了的呢。

一個分詞器，很重要，將一段文本進行各種處理，最后處理好的結果才會拿去建立倒排索引。

10.4.2內置分詞器的介紹

例句：Set the shape to semi-transparent by calling set_trans(5)

standard analyzer標準分詞器：set, the, shape, to, semi, transparent, by, calling, set_trans, 5（默認的是standard）

simple analyzer簡單分詞器：set, the, shape, to, semi, transparent, by, calling, set, trans

whitespace analyzer：Set, the, shape, to, semi-transparent, by, calling, set_trans(5)

language analyzer（特定的語言的分詞器，比如說，english，英語分詞器）：set, shape, semi, transpar, call, set_tran, 5

官方文檔：

https://www.elastic.co/guide/en/elasticsearch/reference/7.4/analysis-analyzers.html

10.5. query string根據字段分詞策略

10.5.1query string分詞

query string必須以和index建立時相同的analyzer進行分詞

query string對exact value和full text的區別對待

如： date：exact value 精確匹配

text: full text 全文檢索

10.5.2測試分詞器

GET /_analyze {"analyzer": "standard","text": "Text to analyze 80" }

返回值：

{"tokens" : [{"token" : "text","start_offset" : 0,"end_offset" : 4,"type" : "<ALPHANUM>","position" : 0},{"token" : "to","start_offset" : 5,"end_offset" : 7,"type" : "<ALPHANUM>","position" : 1},{"token" : "analyze","start_offset" : 8,"end_offset" : 15,"type" : "<ALPHANUM>","position" : 2},{"token" : "80","start_offset" : 16,"end_offset" : 18,"type" : "<NUM>","position" : 3}] }

token 實際存儲的term 關鍵字

position 在此詞條在原文本中的位置

start_offset/end_offset字符在原始字符串中的位置

10.6． mapping回顧總結

（1）往es里面直接插入數據，es會自動建立索引，同時建立對應的mapping。(dynamic mapping)

（2）mapping中就自動定義了每個field的數據類型

（3）不同的數據類型（比如說text和date），可能有的是exact value，有的是full text

（4）exact value，在建立倒排索引的時候，分詞的時候，是將整個值一起作為一個關鍵詞建立到倒排索引中的；full text，會經歷各種各樣的處理，分詞，normaliztion（時態轉換，同義詞轉換，大小寫轉換），才會建立到倒排索引中。

（5）同時呢，exact value和full text類型的field就決定了，在一個搜索過來的時候，對exact value field或者是full text field進行搜索的行為也是不一樣的，會跟建立倒排索引的行為保持一致；比如說exact value搜索的時候，就是直接按照整個值進行匹配，full text query string，也會進行分詞和normalization再去倒排索引中去搜索

（6）可以用es的dynamic mapping，讓其自動建立mapping，包括自動設置數據類型；也可以提前手動創建index和mapping，自己對各個field進行設置，包括數據類型，包括索引行為，包括分詞器，等。

10.7． mapping的核心數據類型以及dynamic mapping

10.7.1 核心的數據類型

string :text and keyword

byte，short，integer，long,float，double

boolean

date

詳見：https://www.elastic.co/guide/en/elasticsearch/reference/7.3/mapping-types.html

下圖是ES7.3核心的字段類型如下：

10.7.2 dynamic mapping 推測規則

true or false --> boolean

123 --> long

123.45 --> double

2019-01-01 --> date

"hello world" --> text/keywod

10.7.3 查看mapping

GET /index/_mapping/

10.8 手動管理mapping

10.8.1查詢所有索引的映射

GET /_mapping

10.8.2 創建映射！！

創建索引后，應該立即手動創建映射

PUT book/_mapping {"properties": {"name": {"type": "text"},"description": {"type": "text","analyzer":"english","search_analyzer":"english"},"pic":{"type":"text","index":false},"studymodel":{"type":"text"}} }

Text 文本類型

1）analyzer

通過analyzer屬性指定分詞器。

上邊指定了analyzer是指在索引和搜索都使用english，如果單獨想定義搜索時使用的分詞器則可以通過search_analyzer屬性。

2）index

index屬性指定是否索引。

默認為index=true，即要進行索引，只有進行索引才可以從索引庫搜索到。

但是也有一些內容不需要索引，比如：商品圖片地址只被用來展示圖片，不進行搜索圖片，此時可以將index設置為false。

刪除索引，重新創建映射，將pic的index設置為false，嘗試根據pic去搜索，結果搜索不到數據。

3）store

是否在source之外存儲，每個文檔索引后會在 ES中保存一份原始文檔，存放在"source"中，一般情況下不需要設置store為true，因為在source中已經有一份原始文檔了。

插入文檔：

PUT /book/_doc/1 {"name":"Bootstrap開發框架","description":"Bootstrap是由Twitter推出的一個前臺頁面開發框架，在行業之中使用較為廣泛。此開發框架包含了大量的CSS、JS程序代碼，可以幫助開發者（尤其是不擅長頁面開發的程序人員）輕松的實現一個不受瀏覽器限制的精美界面效果。","pic":"group1/M00/00/01/wKhlQFqO4MmAOP53AAAcwDwm6SU490.jpg","studymodel":"201002" }

GET /book/_search?q=name:開發

GET /book/_search?q=description:開發

GET /book/_search?q=pic:group1/M00/00/01/wKhlQFqO4MmAOP53AAAcwDwm6SU490.jpg

GET /book/_search?q=studymodel:201002

通過測試發現：name和description都支持全文檢索，pic不可作為查詢條件。

keyword關鍵字字段

目前已經取代了"index": false。上邊介紹的text文本字段在映射時要設置分詞器，keyword字段為關鍵字字段，通常搜索keyword是按照整體搜索，所以創建keyword字段的索引時是不進行分詞的，比如：郵政編碼、手機號碼、身份證等。keyword字段通常用于過慮、排序、聚合等。

date日期類型

日期類型不用設置分詞器。

通常日期類型的字段用于排序。

format

通過format設置日期格式

例子：

下邊的設置允許date字段存儲年月日時分秒、年月日及毫秒三種格式。

{"properties": {"timestamp": {"type": ? "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"}} }

插入文檔：

POST book/_doc/3 {"name": "spring開發基礎","description": "spring 在java領域非常流行，java程序員都在用。","studymodel": "201001","pic":"group1/M00/00/01/wKhlQFqO4MmAOP53AAAcwDwm6SU490.jpg","timestamp":"2018-07-04 18:28:58" }

數值類型

下邊是ES支持的數值類型

1、盡量選擇范圍小的類型，提高搜索效率

2、對于浮點數盡量用比例因子，比如一個價格字段，單位為元，我們將比例因子設置為100這在ES中會按分存儲，映射如下：

"price": {"type": "scaled_float","scaling_factor": 100},

由于比例因子為100，如果我們輸入的價格是23.45則ES中會將23.45乘以100存儲在ES中。

如果輸入的價格是23.456，ES會將23.456乘以100再取一個接近原始值的數，得出2346。

使用比例因子的好處是整型比浮點型更易壓縮，節省磁盤空間。

如果比例因子不適合，則從下表選擇范圍小的去用：

更新已有映射，并插入文檔：

PUT book/doc/3 { "name": "spring開發基礎", "description": "spring 在java領域非常流行，java程序員都在用。", "studymodel": "201001","pic":"group1/M00/00/01/wKhlQFqO4MmAOP53AAAcwDwm6SU490.jpg","timestamp":"2018-07-04 18:28:58","price":38.6 }

10.8.3修改映射

只能創建index時手動建立mapping，或者新增field mapping，但是不能update field mapping。

因為已有數據按照映射早已分詞存儲好。如果修改，那這些存量數據怎么辦。

新增一個字段mapping

PUT /book/_mapping/ {"properties" : {"new_field" : {"type" : ? "text","index": ? "false"}} }

如果修改mapping,會報錯

PUT /book/_mapping/ {"properties" : {"studymodel" : {"type" : ? ?"keyword"}} }

{"error": {"root_cause": [{"type": "illegal_argument_exception","reason": "mapper [studymodel] of different type, current_type [text], merged_type [keyword]"}],"type": "illegal_argument_exception","reason": "mapper [studymodel] of different type, current_type [text], merged_type [keyword]"},"status": 400 }

10.8.4刪除映射

通過刪除索引來刪除映射。

10.9 復雜數據類型

10.9 .1 multivalue field

{ "tags": [ "tag1", "tag2" ]}

建立索引時與string是一樣的，數據類型不能混

10.9 .2. empty field

null，[]，[null]

10.9 .3. object field

PUT /company/_doc/1 {"address": {"country": "china","province": "guangdong","city": "guangzhou"},"name": "jack","age": 27,"join_date": "2019-01-01" }

address：object類型

查詢映射

GET /company/_mapping

{"company" : {"mappings" : {"properties" : {"address" : {"properties" : {"city" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"country" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"province" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}}}},"age" : {"type" : "long"},"join_date" : {"type" : "date"},"name" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}}}}} }

object

{"address": {"country": "china","province": "guangdong","city": "guangzhou"},"name": "jack","age": 27,"join_date": "2017-01-01" }

底層存儲格式

{"name": ? ? ? ? ? [jack],"age": ? ? ? ? [27],"join_date": ? ? [2017-01-01],"address.country": ? ? ? ? [china],"address.province": ? [guangdong],"address.city": [guangzhou] }

對象數組：

{"authors": [{ "age": 26, "name": "Jack White"},{ "age": 55, "name": "Tom Jones"},{ "age": 39, "name": "Kitty Smith"}] }

存儲格式：

{"authors.age": ? [26, 55, 39],"authors.name": ? [jack, white, tom, jones, kitty, smith] }

總結

以上是生活随笔為你收集整理的Elasticsearch之mapping映射入门的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Elasticsearch之文档docu
下一篇： Elasticsearch之type底层