[Hive_add_11] Hive 使用 UDTF 实现日志降维
0. 說明
對日志進行降維處理,將日志分為幾個小表
通過編寫 UDTF ,對日志降維,將日志聚合體相關字段抽取出來,形成新表。
?
?
?
1. 操作流程
1.0 日志部分內容
1532868065.518#192.168.23.1#1532868067548#200#{\"appChannel\":\"appstore\",\"appErrorLogs\":[{\"createdAtMs\":1530455040000,\"errorBrief\":\"at cn.lift.appIn.control.CommandUtil.getInfo(CommandUtil.java:67)\",
\"errorDetail\":\"at cn.lift.dfdfdf.control.CommandUtil.getInfo(CommandUtil.java:67) at sun.reflect.DelegatingMethodAccessorImpl.
invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606)\"},{\"createdAtMs\":1530393180000,
\"errorBrief\":\"at cn.lift.appIn.control.CommandUtil.getInfo(CommandUtil.java:67)\",\"errorDetail\":
\"at cn.lift.dfdfdf.control.CommandUtil.getInfo(CommandUtil.java:67)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)\"}],\"appEventLogs\":[{\"createdAtMs\":1530412800000,\"eventId\":
\"share\",\"logType\":\"event\",\"mark\":\"4\",\"musicID\":\"傲紅塵\"}],\"appPageLogs\":[{\"createdAtMs\":1530449520000,
\"logType\":\"page\",\"nextPage\":\"list.html\",\"pageId\":\"list.html\",\"pageViewCntInSession\":0,\"visitIndex\":\"2\"}],
\"appPlatform\":\"ios\",\"appStartupLogs\":[{\"brand\":\"聯想\",\"carrier\":\"中國聯通\",\"country\":\"china\",\"createdAtMs\":
1530385560000,\"logType\":\"startup\",\"network\":\"cell\",\"province\":\"hebei\",\"screenSize\":\"960 * 640\"},
{\"brand\":\"魅族\",\"carrier\":\"中國鐵通\",\"country\":\"america\",\"createdAtMs\":1530412860000,\"logType\":\"startup\",
\"network\":\"3g\",\"province\":\"guangxi\",\"screenSize\":\"480 * 320\"}],\"appUsageLogs\":[{\"createdAtMs\":1530379200000,
\"logType\":\"usage\",\"singleDownloadTraffic\":\"12800\",\"singleUploadTraffic\":\"128\",\"singleUseDurationSecs\":\"123\"}],
\"appVersion\":\"1.0.0\",\"deviceId\":\"Device000099\",\"deviceStyle\":\"oppo 1\",\"osType\":\"1.4.0\"}
?
1.1 創建 logAgg表
創建 logAgg表,分區表 => year, month, day
create table logAgg(serverTime string,remoteIp string,clientTime string,status string, json string)partitioned by(year string, month string, day string)row format delimitedfields terminated by '#' ;?
1.2 load 數據到 logAgg表
load data local inpath '/home/centos/files/2018-07-01.log' into table logagg partition(year='2018',month='07',day='01');?
1.3 降維處理
1. 代碼編寫
ParseJsonUtil.java
?
ParseEvent.java
?
?
2. 上傳并同步
先打包再放入 /soft/hive/lib 中
cp /soft/hive/lib/myhive-1.0-SNAPSHOT.jar /soft/hadoop/share/hadoop/common/lib/xsync.sh /soft/hadoop/share/hadoop/common/lib/myhive-1.0-SNAPSHOT.jar
?
3. 注冊臨時函數
create temporary function parseEvent as 'com.share.udtf.ParseEvent';?
4.測試
select parseEvent(json) from logAgg;?
1.4 創建?logEvent表
create table logevent(deviceId string, createdAtMs string, eventId string, logType string , mark string, musicID string)stored as parquet tblproperties('parquet.compression'='GZIP');?
1.5 轉儲
insert into logevent select parseEvent(json) from logaggwhere year='2018' and month='07' and day='01';?
1.6?對 logEvent表進行操作
1. 計算每個用戶對每首歌的評分
select deviceid, musicid, sum(cast(mark as int)) as sum from logevent where musicId is not null group by deviceid, musicid;?
2.?計算每個用戶對每首歌的評分與最高評分
select deviceid , musicid, sum, max(sum)over(partition by deviceid) as sum2 from ( select deviceid, musicid, sum(cast(mark as int)) as sum from logevent where musicId is not null group by deviceid, musicid )a;?
3. 使用 sql 計算出,每個用戶最喜歡(評分最高)的歌曲,及其評分
select deviceid , musicid, sum from ( select deviceid , musicid, sum, max(sum)over(partition by deviceid) as sum2 from ( select deviceid, musicid, sum(cast(mark as int)) as sum from logevent where musicId is not null group by deviceid, musicid )a )b where sum=sum2 ;?
?
2. 附加內容?
2.1 Hive 加載 jar 方式
1. 將 jar 放在 Hive 的 lib 下,并重啟 Hive
/soft/hive/lib
2. 通過配置文件指定 jar,并重啟 Hive
hive.aux.jars.path=/x/x/x.jar
3. 臨時加載 jar
hive> add jar /x/x/x.jar
2.2 注冊函數方法
1. create function as ''; // 永久
2. create temporary function as ''; // 臨時
3. create function xxx as '' using jar 'hdfs://mycluster/xxx.jar'; // 將 jar 包放入 HDFS 中,避免重啟
?
?
轉載于:https://www.cnblogs.com/share23/p/10361382.html
總結
以上是生活随笔為你收集整理的[Hive_add_11] Hive 使用 UDTF 实现日志降维的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 物联网安全的多元化
- 下一篇: 如何用阿里云容灾服务(HDR)为本地数据