Hive脚本笔记
統計一天微博內“taobao.com”出現次數,每小時記錄一次:
for i in {00..23}; do hive -e "select count(*) from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=20131028 and hour=$i;" >> log.txt; done?
統計某一小時微博內“taobao.com”出現次數:
hive -e "select args['uid'] from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=20131028 and hour=10;"?
尋找某一天微博內出現“taobao.com”的用戶TOP20:
hive -e "select args['uid'] from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=20131028;" | sort -rn | uniq -c | sort -rnk1 | head -n20
抽取某一uid的日志記錄:
hive -e "select * from mds_anti_sass_log where source=3 and type=1 and args['uid']=2106332597 and dt=20131028;"
抽取某一uid微博發送成功的日志記錄:
hive -e "select * from mds_anti_sass_log where source=3 and type=1 and args['uid']=2106332597 and rcode like '%:1}%' ?and dt=20131028;"
來源為“淘寶網”的微博數量:
hive -e "select count(*) from mds_anti_sass_log where source=3 and type=1 and args['appid']=804659 and dt=20131028;"
10月1日到10月9日taobao.com的走勢:
for i in {1..9}; do echo 2013100$i;hive -e "select count(*) from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=2013100$i;" >> log_1_9.txt; done
10月10日到10月30日taobao.com的走勢:
for i in {10..30}; do echo 201310$i;hive -e "select count(*) from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=201310$i;" >> log.txt; done
10月10日到10月30日發微博包含“taobao.com”的獨立uid數:
for i in {10..30}; do echo 201310$i;hive -e "select args['uid'] from mds_anti_sass_log where source=3 and type=1 and args['content'] like '%taobao.com%' and dt=201310$i;" | sort -rn | uniq -c | wc -l >> uid_num.txt ; done
轉載于:https://www.cnblogs.com/ericliu/p/3396954.html
總結
- 上一篇: Logistic Regression
- 下一篇: Linux系统下Oracle11g r1