Druid学习之路 (四)Druid的数据采集格式
生活随笔
收集整理的這篇文章主要介紹了
Druid学习之路 (四)Druid的数据采集格式
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
作者:Syn良子 出處:https://www.cnblogs.com/cssdongl/p/9715735.html 轉載請注明出處
Druid的數據采集格式
Druid可以采集非標準化的數據諸如JSON,CSV或者以某種分隔符隔開的TSV格式,當然還支持自定義格式.雖然大部分的文檔使用JSON格式,但是通過druid來配置支持其他的限定格式也不是很難.
當前支持的格式化數據
JSON
{"timestamp": "2013-08-31T01:02:33Z", "page": "Gypsy Danger", "language" : "en", "user" : "nuclear", "unpatrolled" : "true", "newPage" : "true", "robot": "false", "anonymous": "false", "namespace":"article", "continent":"North America", "country":"United States", "region":"Bay Area", "city":"San Francisco", "added": 57, "deleted": 200, "delta": -143}CSV
2013-08-31T01:02:33Z,"Gypsy Danger","en","nuclear","true","true","false","false","article","North America","United States","Bay Area","San Francisco",57,200,-143TSV
2013-08-31T01:02:33Z "Gypsy Danger" "en" "nuclear" "true" "true" "false" "false" "article" "North America" "United States" "Bay Area" "San Francisco" 57 200 -143需要注意的是CSV,TSV不能包含列頭,這點在數據采集的時候一定要注意
自定義格式
Druid支持使用正則解析和JavaScript來自定義數據格式.但是這種方式并沒有自己實現的Java解析器或者額外的流式處理工具效率更高.
配置數據采集的schema
什么是data schema?其實就是Druid的index數據攝取任務需要的數據源的描述的元數據.它主要描述要采集的數據類型,數據由哪些列構成,哪些是指標列,哪些是維度列,時間的粒度等.
以CSV格式舉例
"parseSpec": { "format" : "csv", "timestampSpec" : {"column" : "timestamp" }, "columns" : ["timestamp","page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city","added","deleted","delta"], "dimensionsSpec" : {"dimensions" : ["page","language","user","unpatrolled","newPage","robot","anonymous","namespace","continent","country","region","city"] }}parseSpec指明了數據源格式,這里是format中表明是CSV格式,然后說明時間戳字段名是timestamp,數據字段名是columns里面那一堆,dimensionsSpec則代表哪些字段可以作為維度.
參考資料:Druid的數據格式
轉載于:https://www.cnblogs.com/cssdongl/p/9715735.html
總結
以上是生活随笔為你收集整理的Druid学习之路 (四)Druid的数据采集格式的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 光耦在短距离通信中的应用
- 下一篇: [NOIP2003]传染病控制题解