02.elasticsearch bucket aggregation查询
文章目錄
- 1. bucket aggregation 查詢類型概覽
- 2. 數(shù)據(jù)準(zhǔn)備
- 3. 使用樣例
- 1. Terms Aggregation:
- 1. 普通的terms agg
- 2. 嵌套一個metric agg 作為sub agg查詢
- 3. 嵌套一個terms agg作為sub agg查詢
- 2. Range Aggregation:
- 3. Date Histogram Aggregation:
- 4. Date Range Aggregation
- 5. Filter Aggregation
- 6. Filters Aggregation
- 7. Histogram Aggregation
- 8. Missing Aggregation: 統(tǒng)計某個field不存在的doc
- 9. nested aggs:用于nested的doc的聚合查詢,一般是再有一個子查詢來統(tǒng)計
- 10. child agg 查詢,針對join類型的數(shù)據(jù)進(jìn)行查詢
- 11. parent agg 查詢,針對join類型的數(shù)據(jù)進(jìn)行查詢
- 12. Composite Aggregation 多個維度的terms進(jìn)行組合操作,類似多層terms的嵌套,但是結(jié)果不是嵌套的,和mysql中按照多個字段進(jìn)行g(shù)roup by類似
- 13. Adjacency Matrix Aggregation,鄰接矩陣聚合
- 14. global agg 查詢,針對所有數(shù)據(jù)的查詢
- 15. Significant Terms Aggregation: 自動查找顯著性的關(guān)鍵字
- 16. Significant Text Aggregation: 自動查找顯著性的關(guān)鍵字
- 17. Sampler Aggregation: 抽樣數(shù)據(jù)聚合
- 18.Reverse nested Aggregation 在nested agg中仍然可以對parent 的數(shù)據(jù)進(jìn)行統(tǒng)計
elasticsearch的aggregate查詢現(xiàn)在越來越豐富了,目前總共有4類。
本篇就主要學(xué)習(xí)bucket aggregation,bucket aggregation查詢類似group by 查詢,而且相對metric aggregation 查詢來說,bucket agg可以有sub aggregation, 也就是可以進(jìn)行嵌套,嵌套的sub agg可以是bucket agg也可以是 metric agg。
1. bucket aggregation 查詢類型概覽
Terms Aggregation: 典型的grop by 類型,按照某個field將文檔進(jìn)行分桶,如果該field的value是數(shù)組的話,則該文檔會被統(tǒng)計到多個bucket當(dāng)中
Range Aggregation: 一般是針對number field,指定多個范圍進(jìn)行bucket劃分
Date Histogram Aggregation: 按照時間進(jìn)行分bucket,自動按照月等進(jìn)行劃分
Date Range Aggregation: 按照時間范圍進(jìn)行bucket,類似range aggregation
Filter Aggregation: 就是一個簡單的過濾器,和query中的filter功能類似
Filters Aggregation: 多個filter進(jìn)行過濾
Histogram Aggregation: 柱狀圖的聚合
Missing Aggregation: 統(tǒng)計某個field不存在的doc
Adjacency Matrix Aggregation
Auto-interval Date Histogram Aggregation
Children Aggregation
Composite Aggregation
Diversified Sampler Aggregation
Geo Distance Aggregation
GeoHash grid Aggregation
GeoTile Grid Aggregation
Global Aggregation
IP Range Aggregation
Nested Aggregation
Parent Aggregation
Reverse nested Aggregation
Sampler Aggregation
Significant Terms Aggregation
Significant Text Aggregation
2. 數(shù)據(jù)準(zhǔn)備
演唱會的票信息
GET seats1028/_search
總共有3w+條這樣的數(shù)據(jù)
3. 使用樣例
1. Terms Aggregation:
典型的grop by 類型,按照某個field將文檔進(jìn)行分桶,如果該field的value是數(shù)組的話,則該文檔會被統(tǒng)計到多個bucket當(dāng)中
1. 普通的terms agg
GET seats1028/_search {"size": 0,"aggs": {"term_price":{"terms": {"field": "price","min_doc_count": 13,"size": 50}}} }返回 "aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 35384,"buckets" : [{"key" : 910,"doc_count" : 13},{"key" : 3273,"doc_count" : 13},{"key" : 3648,"doc_count" : 13}]}}2. 嵌套一個metric agg 作為sub agg查詢
按照row進(jìn)行分組,取doc數(shù)量最多的前3個bucket,并計算每個bucket中的price的最大值。
GET seats1028/_search {"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"max_price": {"max": {"field": "price"}}}}} }返回"aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"max_price" : {"value" : 9998.0}},{"key" : 3,"doc_count" : 5796,"max_price" : {"value" : 9999.0}},{"key" : 1,"doc_count" : 5791,"max_price" : {"value" : 9999.0}}]}}3. 嵌套一個terms agg作為sub agg查詢
先按照row進(jìn)行bucket劃分,給出doc數(shù)量前3的row對應(yīng)的bucket,然后每個bucket按照number進(jìn)行再分bucket, 并給出doc數(shù)量前三的number值對應(yīng)的bucket。
GET seats1028/_search {"size": 0,"aggs": {"term_price":{"terms": {"field": "row","min_doc_count": 13,"size": 3,"order": {"_count": "desc"}},"aggs": {"number_term": {"terms": {"field": "number","size": 3,"order": {"_count": "desc"}}}}}} }返回 "aggregations" : {"term_price" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 13608,"buckets" : [{"key" : 2,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 3,"doc_count" : 5796,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4368,"buckets" : [{"key" : 1,"doc_count" : 476},{"key" : 2,"doc_count" : 476},{"key" : 3,"doc_count" : 476}]}},{"key" : 1,"doc_count" : 5791,"number_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 4363,"buckets" : [{"key" : 5,"doc_count" : 476},{"key" : 6,"doc_count" : 476},{"key" : 7,"doc_count" : 476}]}}]}}2. Range Aggregation:
一般是針對number field,指定多個范圍進(jìn)行bucket劃分,包含from數(shù)值,不包含to對應(yīng)的數(shù)值
GET seats1028/_search {"size": 0,"aggs": {"price_range": {"range": {"field": "price","ranges": [{"from": 5000,"to": 6000}]}}} }返回 "aggregations" : {"price_range" : {"buckets" : [{"key" : "5000.0-6000.0","from" : 5000.0,"to" : 6000.0,"doc_count" : 3646}]}}3. Date Histogram Aggregation:
按照時間進(jìn)行分bucket,自動按照月等進(jìn)行劃分
GET seats1028/_search {"size": 0,"aggs": {"price_date_histogram": {"date_histogram": {"field": "datetime","calendar_interval": "month"}}} }返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key_as_string" : "2018-03-01T00:00:00.000Z","key" : 1519862400000,"doc_count" : 2310},{"key_as_string" : "2018-04-01T00:00:00.000Z","key" : 1522540800000,"doc_count" : 3946},{"key_as_string" : "2018-05-01T00:00:00.000Z","key" : 1525132800000,"doc_count" : 3948},{"key_as_string" : "2018-06-01T00:00:00.000Z","key" : 1527811200000,"doc_count" : 3948},{"key_as_string" : "2018-07-01T00:00:00.000Z","key" : 1530403200000,"doc_count" : 3948}]}}4. Date Range Aggregation
按照時間范圍進(jìn)行bucket,類似range aggregation
GET seats1028/_search {"size": 0,"aggs": {"price_date_histogram": {"date_range": {"field": "datetime","ranges": [{"from": "2018-10-01T00:00:00.000Z","to": "2018-11-01T00:00:00.000Z"}]}}} }返回"aggregations" : {"price_date_histogram" : {"buckets" : [{"key" : "2018-10-01T00:00:00.000Z-2018-11-01T00:00:00.000Z","from" : 1.538352E12,"from_as_string" : "2018-10-01T00:00:00.000Z","to" : 1.5410304E12,"to_as_string" : "2018-11-01T00:00:00.000Z","doc_count" : 3948}]}}5. Filter Aggregation
就是一個簡單的過濾器,和query中的filter功能類似
GET seats1028/_search {"size": 0,"aggs": {"sold_filter": {"filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"aggs": {"max_price": {"max": {"field": "price"}}}}} }返回 "aggregations" : {"sold_filter" : {"doc_count" : 6300, # 這個是filter后的doc count"max_price" : {"value" : 9996.0}}}6. Filters Aggregation
多個filter進(jìn)行過濾, 對于每個filter過濾的結(jié)果再應(yīng)用子agg查詢
GET seats1028/_search {"size": 0,"aggs": {"sold_filter": {"filters": {"filters": { # 這個地方的用法還是挺怪異的,最終還是"tip_filter": {"range": {"tip": {"gte": 10,"lte": 20}}},"number_filter": {"range": {"number": {"gte": 5,"lte":10}}}}},"aggs": {"max_price": {"max": {"field": "price"}}}}} } 返回"aggregations" : {"sold_filter" : {"buckets" : {"number_filter" : {"doc_count" : 16072,"max_price" : {"value" : 9999.0}},"tip_filter" : { "doc_count" : 6300,"max_price" : {"value" : 9996.0}}}}}可以看到這里對每一個子的filter都進(jìn)行了過濾
7. Histogram Aggregation
柱狀圖的聚合,這里用來聚合的字段一般是數(shù)值型,比較方便用來分組
GET seats1028/_search {"size": 0,"aggs": {"tip_histogram":{"histogram": {"field": "tip","interval": 4}}} }返回"aggregations" : {"number_histogram" : {"buckets" : [{"key" : 16.0,"doc_count" : 4200},{"key" : 20.0,"doc_count" : 8400},{"key" : 24.0,"doc_count" : 17808},{"key" : 28.0,"doc_count" : 5794}]}}8. Missing Aggregation: 統(tǒng)計某個field不存在的doc
GET seats1028/_search {"size":0,"aggs": {"miss_f": {"missing": {"field": "row"}}} }返回 "aggregations" : {"miss_f" : {"doc_count" : 1}}9. nested aggs:用于nested的doc的聚合查詢,一般是再有一個子查詢來統(tǒng)計
數(shù)據(jù)樣例
這個查詢用于nested的doc的聚合查詢,一般是再有一個子查詢來統(tǒng)計
數(shù)據(jù)樣例,班級里面有一個學(xué)生列表,學(xué)生有age,name屬性
對應(yīng)的查詢
GET nest_test/_search {"size": 0,"aggs": {"nested_agg": {"nested": {"path": "class.students"},"aggs": {"min_age": {"min": {"field": "class.students.age"}}}}} }返回"aggregations" : {"nested_agg" : {"doc_count" : 8,"min_age" : {"value" : 20.0}}}10. child agg 查詢,針對join類型的數(shù)據(jù)進(jìn)行查詢
數(shù)據(jù)準(zhǔn)備,每個教室(class_room)可以有多個課程(subject),每個學(xué)生(student)可以選擇一個或者多個class_room,這樣class_room和student就構(gòu)成了parent/child的關(guān)系
PUT join_class {"mappings": {"properties": {"subject":{"type": "keyword"},"class_student":{"type": "join","relations":{"class_room":"student"}}}} }PUT join_class/_doc/1 {"subject":["english","Chinese","Russia"],"class_student":{"name":"class_room"},"des":"this class room teach english, Chinese, Russia" }PUT join_class/_doc/2?routing=1 {"class_student":{"name":"student","parent":1},"name":"jack" }PUT join_class/_doc/3?routing=1 {"class_student":{"name":"student","parent":1},"name":"pony" }下面這個查詢要查找的是每個subject的對應(yīng)的有哪些學(xué)生
GET join_class/_search {"size":0,"query": {"match_all": {}},"aggs": {"subject_term": {"terms": {"field": "subject","size": 10},"aggs": {"subject_student": {"children": {"type": "student"},"aggs": {"term_name": {"terms": {"field": "name.keyword","size": 10}}}}}}} }返回"aggregations" : {"subject_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "Russia","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}},{"key" : "english","doc_count" : 1,"subject_student" : {"doc_count" : 2,"term_name" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1},{"key" : "pony","doc_count" : 1}]}}}]}}11. parent agg 查詢,針對join類型的數(shù)據(jù)進(jìn)行查詢
承接上面的數(shù)據(jù)樣例,下面的請求查找每個學(xué)生選的課程
GET join_class/_search {"size":0,"query": {"match_all": {}},"aggs": {"student_term": {"terms": {"field": "name.keyword","size": 10},"aggs": {"subject_student": {"parent": {"type": "student"},"aggs": {"choose_subject": {"terms": {"field": "subject","size": 10}}}}}}} }返回
"aggregations" : {"student_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"subject_student" : {"doc_count" : 1,"choose_subject" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "Chinese","doc_count" : 1},{"key" : "Russia","doc_count" : 1},{"key" : "english","doc_count" : 1}]}}}]}}12. Composite Aggregation 多個維度的terms進(jìn)行組合操作,類似多層terms的嵌套,但是結(jié)果不是嵌套的,和mysql中按照多個字段進(jìn)行g(shù)roup by類似
數(shù)據(jù)初始化
PUT composite_test {"mappings": {"properties": {"area": {"type": "keyword"},"userid": {"type": "keyword"},"sendtime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss"}}} } POST composite_test/_bulk { "index" : {"_type" :"_doc"}} {"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"} { "index" : {"_type" : "_doc"}} {"area":"33","userid":"400015","sendtime":"2019-01-17 00:00:00"} { "index" : {"_type" : "_doc"}} {"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"} { "index" : { "_type" : "_doc"}} {"area":"35","userid":"400016","sendtime":"2019-01-18 00:00:00"} { "index" : {"_type" : "_doc"}} {"area":"33","userid":"400017","sendtime":"2019-01-17 00:00:00"}下面的查詢會按照area,userid, sendtime 三個字段進(jìn)行g(shù)roup by查詢
GET composite_test/_search {"size": 0,"aggs": {"my_buckets": {"composite": {"sources": [{"area": {"terms": {"field": "area"}}},{"userid": {"terms": {"field": "userid"}}},{"sendtime": {"date_histogram": {"field": "sendtime","fixed_interval": "1d","format": "yyyy-MM-dd"}}}]}}} }返回
"aggregations" : {"my_buckets" : {"after_key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"buckets" : [{"key" : {"area" : "33","userid" : "400015","sendtime" : "2019-01-17"},"doc_count" : 2},{"key" : {"area" : "33","userid" : "400017","sendtime" : "2019-01-17"},"doc_count" : 1},{"key" : {"area" : "35","userid" : "400016","sendtime" : "2019-01-18"},"doc_count" : 2}]}}13. Adjacency Matrix Aggregation,鄰接矩陣聚合
鄰接矩陣聚合,上面的composition是多個維度的terms求交,這個更弱一些,只能做指定的field的某些值進(jìn)行鄰接矩陣生成
使用上面的數(shù)據(jù)樣例,下面的查詢會返回area=33的doc統(tǒng)計,userid=400015的doc統(tǒng)計,同時還會返回area=33 & userid=400015的doc統(tǒng)計
返回
"aggregations" : {"composite_two" : {"buckets" : [{"key" : "area_filter","doc_count" : 3},{"key" : "area_filter&user_id_filter","doc_count" : 2},{"key" : "user_id_filter","doc_count" : 2}]}}14. global agg 查詢,針對所有數(shù)據(jù)的查詢
這個就是忽略query的過濾信息,直接針對index中的所有數(shù)據(jù)進(jìn)行子聚合
GET seats1028/_search {"size": 0, "query": {"term": {"row": {"value": 5}}},"aggs": {"global_row": {"global": {},"aggs": {"avg_row": {"avg": {"field": "row"}}}},"avg_row02":{"avg": {"field": "row"}}} }返回
"aggregations" : {"global_row" : {"doc_count" : 30992,"avg_row" : {"value" : 4.333871123874673 # 這個值是從所有的doc中算出來的}},"avg_row02" : {"value" : 5.0 # 這個是query過濾后的doc中計算出來的}}15. Significant Terms Aggregation: 自動查找顯著性的關(guān)鍵字
這個是在keyword的字段中查找當(dāng)前的顯著性的字段,查找出現(xiàn)頻率比較高的字段
還是使用案例來說明更靠譜,這里舉例的是網(wǎng)頁新聞news,每個新聞news有作者(author) title, topic,等信息
相關(guān)數(shù)據(jù)構(gòu)造如下
查找每個作者關(guān)注最多的topic,那么該作者肯定在該topic的發(fā)問最多
GET news/_search {"size": 0,"aggregations": {"authors": {"terms": {"field": "author"},"aggregations": {"significant_topic_types": {"significant_terms": {"field": "topic"}}}}} }返回
"aggregations" : {"authors" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "John Michael","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5,"bg_count" : 10,"buckets" : [{"key" : "automobile","doc_count" : 4,"score" : 0.4800000000000001,"bg_count" : 5}]}},{"key" : "Robert Cann","doc_count" : 5,"significant_topic_types" : {"doc_count" : 5, # Robert Cann 總的doc數(shù)量為5個"bg_count" : 10, # index中所有的doc數(shù)量為10"buckets" : [{"key" : "ai","doc_count" : 3, # Robert Cann 的topic為ai的doc總共有3個"score" : 0.2999999999999999,"bg_count" : 4 ## 這里是指索引中topic是ai的文檔總共有4個}]}}]}}上面的統(tǒng)計說明John Michael 這位作者最關(guān)注的話題是 automobile(自動駕駛),而Robert Cann 最關(guān)注的是ai相關(guān)的話題,相關(guān)的bg_count的說明查看上面的注釋
16. Significant Text Aggregation: 自動查找顯著性的關(guān)鍵字
這個和上面的Significant terms Aggregation類似,就是針對的是text字段,而且會進(jìn)行分詞處理
使用上面的數(shù)據(jù)進(jìn)行下面的查詢
返回
"aggregations" : {"significant_title" : {"doc_count" : 3,"bg_count" : 10,"buckets" : [{"key" : "ai","doc_count" : 3,"score" : 2.3333333333333335,"bg_count" : 3}]}}17. Sampler Aggregation: 抽樣數(shù)據(jù)聚合
這個一般是在significant_terms 查詢的時候,有時候索引中的數(shù)據(jù)可能非常大,導(dǎo)致耗時也比較嚴(yán)重,可以用這個來做抽樣聚合,抽取更相關(guān)的樣本數(shù)據(jù)來進(jìn)行聚合
POST /stackoverflow/_search?size=0 {"query": {"query_string": {"query": "tags:kibana OR tags:javascript"}},"aggs": {"sample": {"sampler": {"shard_size": 200},"aggs": {"keywords": {"significant_terms": {"field": "tags","exclude": ["kibana", "javascript"]}}}}} }shard_size 參數(shù)指的是每個分片抽取的樣本數(shù)量,默認(rèn)為 100
返回
18.Reverse nested Aggregation 在nested agg中仍然可以對parent 的數(shù)據(jù)進(jìn)行統(tǒng)計
Reverse nested Aggregation 的作用主要是能夠讓聚合在作為 Nested Aggregation 子聚合的情況下,跳出嵌套類型,對根文檔的數(shù)據(jù)作聚合計算。
有例子:
查詢
GET /issues/_search {"size": 0,"query": {"match_all": {}},"aggs": {"comments": {"nested": {"path": "comments"},"aggs": {"top_usernames": {"terms": {"field": "comments.username"},"aggs": {"comment_to_issue": {"reverse_nested": {},"aggs": {"top_tags_per_comment": {"terms": {"field": "tags"}}}}}}}}} }返回
"aggregations" : {"comments" : {"doc_count" : 4,"top_usernames" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "jack","doc_count" : 2,"comment_to_issue" : {"doc_count" : 2,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "improve","doc_count" : 2},{"key" : "advice","doc_count" : 1},{"key" : "bug","doc_count" : 1}]}}},{"key" : "nacy","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "advice","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}},{"key" : "pony","doc_count" : 1,"comment_to_issue" : {"doc_count" : 1,"top_tags_per_comment" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "bug","doc_count" : 1},{"key" : "improve","doc_count" : 1}]}}}]}}}在 Nested Aggregation 聚合下,Reverse nested Aggregation 的子聚合計算聚合的數(shù)據(jù)集是該嵌套文檔的根文檔。
根據(jù) Reverse nested Aggregation 的作用,可以清楚這是一個專門作為 Nested Aggregation 子聚合的聚合計算,所以作為頂層聚合或者是作為非 Nested Aggregation 的子聚合是沒意義的。
在默認(rèn)情況下, Reverse nested Aggregation 將找到根文檔,當(dāng)然如果有多層嵌套,也可以通過 path 參數(shù)指定文檔的路徑。
總結(jié)
以上是生活随笔為你收集整理的02.elasticsearch bucket aggregation查询的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 01.elasticsearch met
- 下一篇: 03.elasticsearch pip