57 Hive案例(数据ETL)
生活随笔
收集整理的這篇文章主要介紹了
57 Hive案例(数据ETL)
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
需求
對web點擊流日志基礎數據表進行etl(按照倉庫模型設計)
按各時間維度統計來源域名top10
已有數據表 “t_orgin_weblog” :
| valid | string | |
| remote_addr | string | |
| remote_user | string | |
| time_local | string | |
| request | string | |
| status | string | |
| body_bytes_sent | string | |
| http_referer | string | |
| http_user_agent | string |
數據示例
| true|1.162.203.134| - | 18/Sep/2013:13:47:35| /images/my.jpg | 200| 19939 | "http://www.angularjs.cn/A0d9" | "Mozilla/5.0 (Windows || true|1.202.186.37 | - | 18/Sep/2013:15:39:11| /wp-content/uploads/2013/08/windjs.png| 200| 34613 | "http://cnodejs.org/topic/521a30d4bee8d3cb1272ac0f" | "Mozilla/5.0 (Macintosh;|實現步驟
1、對原始數據進行抽取轉換
–將來訪url分離出host path query query id
2、從前述步驟進一步分離出日期時間形成ETL明細表“t_etl_detail” day tm
drop table if exists t_etl_detail; create table t_etl_detail as select b.*,substring(time_local,0,11) as daystr, substring(time_local,13) as tmstr, substring(time_local,4,3) as month, substring(time_local,0,2) as day, substring(time_local,13,2) as hour from t_etl_referurl b;3、對etl數據進行分區(包含所有數據的結構化信息)
drop table t_etl_detail_prt; create table t_etl_detail_prt( valid string, remote_addr string, remote_user string, time_local string, request string, status string, body_bytes_sent string, http_referer string, http_user_agent string, host string, path string, query string, query_id string, daystr string, tmstr string, month string, day string, hour string) partitioned by (mm string,dd string);導入數據
insert into table t_etl_detail_prt partition(mm='Sep',dd='18') select * from t_etl_detail where daystr='18/Sep/2013';insert into table t_etl_detail_prt partition(mm='Sep',dd='19') select * from t_etl_detail where daystr='19/Sep/2013';分個時間維度統計各referer_host的訪問次數并排序
create table t_refer_host_visit_top_tmp as select referer_host,count(*) as counts,mm,dd,hh from t_display_referer_counts group by hh,dd,mm,referer_host order by hh asc,dd asc,mm asc,counts desc;4、來源訪問次數topn各時間維度URL
取各時間維度的referer_host訪問次數topn
總結
以上是生活随笔為你收集整理的57 Hive案例(数据ETL)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 一些四六级的事
- 下一篇: 车联网V-2X智能汽车驾驶