生活随笔
收集整理的這篇文章主要介紹了
Hive分析窗口函数(一) SUM,AVG,MIN,MAX
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
Hive分析窗口函數(一) SUM,AVG,MIN,MAX
Hive分析窗口函數(一) SUM,AVG,MIN,MAX
Hive中提供了越來越多的分析函數,用于完成負責的統計分析。抽時間將所有的分析窗口函數理一遍,將陸續發布。
今天先看幾個基礎的,SUM、AVG、MIN、MAX。
用于實現分組內所有和連續累積的統計。
數據準備
CREATE EXTERNAL TABLE lxw1234?(cookieid?string,createtime?string,?--daypv INT)?ROW FORMAT DELIMITEDFIELDS TERMINATED BY?','stored?as?textfile location?'/tmp/lxw11/';?DESC lxw1234;cookieid STRINGcreatetime STRINGpv INT?hive>?select?*?from?lxw1234;OKcookie1?2015-04-10?1cookie1?2015-04-11?5cookie1?2015-04-12?7cookie1?2015-04-13?3cookie1?2015-04-14?2cookie1?2015-04-15?4cookie1?2015-04-16?4SUM — 注意,結果和ORDER BY相關,默認為升序
SELECT cookieid,createtime,pv,SUM(pv)?OVER(PARTITION BY cookieid ORDER BY createtime)?AS pv1,?--?默認為從起點到當前行SUM(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)?AS pv2,?--從起點到當前行,結果同pv1SUM(pv)?OVER(PARTITION BY cookieid)?AS pv3,?--分組內所有行SUM(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN?3?PRECEDING AND CURRENT ROW)?AS pv4,?--當前行+往前3行SUM(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN?3?PRECEDING AND?1?FOLLOWING)?AS pv5,?--當前行+往前3行+往后1行SUM(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)?AS pv6?---當前行+往后所有行FROM lxw1234;?cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6-----------------------------------------------------------------------------cookie1?2015-04-10?1?1?1?26?1?6?26cookie1?2015-04-11?5?6?6?26?6?13?25cookie1?2015-04-12?7?13?13?26?13?16?20cookie1?2015-04-13?3?16?16?26?16?18?13cookie1?2015-04-14?2?18?18?26?17?21?10cookie1?2015-04-15?4?22?22?26?16?20?8cookie1?2015-04-16?4?26?26?26?13?13?4?
pv1: 分組內從起點到當前行的pv累積,如,11號的pv1=10號的pv+11號的pv, 12號=10號+11號+12號
pv2: 同pv1
pv3: 分組內(cookie1)所有的pv累加
pv4: 分組內當前行+往前3行,如,11號=10號+11號, 12號=10號+11號+12號, 13號=10號+11號+12號+13號, 14號=11號+12號+13號+14號
pv5: 分組內當前行+往前3行+往后1行,如,14號=11號+12號+13號+14號+15號=5+7+3+2+4=21
pv6: 分組內當前行+往后所有行,如,13號=13號+14號+15號+16號=3+2+4+4=13,14號=14號+15號+16號=2+4+4=10
?
如果不指定ROWS BETWEEN,默認為從起點到當前行;
如果不指定ORDER BY,則將分組內所有值累加;
關鍵是理解ROWS BETWEEN含義,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:當前行
UNBOUNDED:起點,UNBOUNDED PRECEDING 表示從前面的起點, UNBOUNDED FOLLOWING:表示到后面的終點
–其他AVG,MIN,MAX,和SUM用法一樣。
--AVGSELECT cookieid,createtime,pv,AVG(pv)?OVER(PARTITION BY cookieid ORDER BY createtime)?AS pv1,?--?默認為從起點到當前行AVG(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)?AS pv2,?--從起點到當前行,結果同pv1AVG(pv)?OVER(PARTITION BY cookieid)?AS pv3, --分組內所有行AVG(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN?3?PRECEDING AND CURRENT ROW)?AS pv4,?--當前行+往前3行AVG(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN?3?PRECEDING AND?1?FOLLOWING)?AS pv5,?--當前行+往前3行+往后1行AVG(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)?AS pv6?---當前行+往后所有行FROM lxw1234;cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6-----------------------------------------------------------------------------cookie1?2015-04-10?1?1.0?1.0?3.7142857142857144?1.0?3.0?3.7142857142857144cookie1?2015-04-11?5?3.0?3.0?3.7142857142857144?3.0?4.333333333333333?4.166666666666667cookie1?2015-04-12?7?4.333333333333333?4.333333333333333?3.7142857142857144?4.333333333333333?4.0?4.0cookie1?2015-04-13?3?4.0?4.0?3.7142857142857144?4.0?3.6?3.25cookie1?2015-04-14?2?3.6?3.6?3.7142857142857144?4.25?4.2?3.3333333333333335cookie1?2015-04-15?4?3.6666666666666665?3.6666666666666665?3.7142857142857144?4.0?4.0?4.0cookie1?2015-04-16?4?3.7142857142857144?3.7142857142857144?3.7142857142857144?3.25?3.25?4.0--MINSELECT cookieid,createtime,pv,MIN(pv)?OVER(PARTITION BY cookieid ORDER BY createtime)?AS pv1,?--?默認為從起點到當前行MIN(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)?AS pv2,?--從起點到當前行,結果同pv1MIN(pv)?OVER(PARTITION BY cookieid)?AS pv3, --分組內所有行MIN(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN?3?PRECEDING AND CURRENT ROW)?AS pv4,?--當前行+往前3行MIN(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN?3?PRECEDING AND?1?FOLLOWING)?AS pv5,?--當前行+往前3行+往后1行MIN(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)?AS pv6?---當前行+往后所有行FROM lxw1234;?cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6-----------------------------------------------------------------------------cookie1?2015-04-10?1?1?1?1?1?1?1cookie1?2015-04-11?5?1?1?1?1?1?2cookie1?2015-04-12?7?1?1?1?1?1?2cookie1?2015-04-13?3?1?1?1?1?1?2cookie1?2015-04-14?2?1?1?1?2?2?2cookie1?2015-04-15?4?1?1?1?2?2?4cookie1?2015-04-16?4?1?1?1?2?2?4----MAXSELECT cookieid,createtime,pv,MAX(pv)?OVER(PARTITION BY cookieid ORDER BY createtime)?AS pv1,?--?默認為從起點到當前行MAX(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)?AS pv2,?--從起點到當前行,結果同pv1MAX(pv)?OVER(PARTITION BY cookieid)?AS pv3, --分組內所有行MAX(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN?3?PRECEDING AND CURRENT ROW)?AS pv4,?--當前行+往前3行MAX(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN?3?PRECEDING AND?1?FOLLOWING)?AS pv5,?--當前行+往前3行+往后1行MAX(pv)?OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)?AS pv6?---當前行+往后所有行FROM lxw1234;?cookieid createtime pv pv1 pv2 pv3 pv4 pv5 pv6-----------------------------------------------------------------------------cookie1?2015-04-10?1?1?1?7?1?5?7cookie1?2015-04-11?5?5?5?7?5?7?7cookie1?2015-04-12?7?7?7?7?7?7?7cookie1?2015-04-13?3?7?7?7?7?7?4cookie1?2015-04-14?2?7?7?7?7?7?4cookie1?2015-04-15?4?7?7?7?7?7?4cookie1?2015-04-16?4?7?7?7?4?4?4其他函數的介紹將陸續整理發布。。
轉載于:https://www.cnblogs.com/mobiwangyue/p/8087031.html
總結
以上是生活随笔為你收集整理的Hive分析窗口函数(一) SUM,AVG,MIN,MAX的全部內容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。