Hive 窗口函数详解
目錄
- 1、hive窗口函數(shù)
- 2、在日常的開發(fā)中常用那些類型函數(shù)?
- 3、了解哪些窗口函數(shù),都是什么意思?
- 窗口函數(shù)功能
- over語法
- 1、物理窗口(真實往上下移動多少行rows between):
- 2、 邏輯窗口(滿足條件上下多少行):(金融行業(yè)、保險行業(yè)、p2p等)
- 4、窗口聚合函數(shù)
- count開窗函數(shù)
- sum開窗函數(shù)
- avg開窗函數(shù)
- min開窗函數(shù)
- max開窗函數(shù)
- 5、窗口分析函數(shù)
- first_value開窗函數(shù)
- last_value開窗函數(shù)
- lag開窗函數(shù)
- lead開窗函數(shù)
- cume_dist開窗函數(shù)
- 6、窗口排序函數(shù)
- rank開窗函數(shù)
- dense_rank開窗函數(shù)
- ntile開窗函數(shù)
- row_number開窗函數(shù)
- percent_rank開窗函數(shù)
1、hive窗口函數(shù)
窗口函數(shù)是什么鬼?
窗口函數(shù)指定了函數(shù)工作的數(shù)據(jù)窗口大小(當前行的上下多少行),這個數(shù)據(jù)窗口大小可能會隨著行的變化而變化。
窗口函數(shù)和聚合函數(shù)區(qū)別?
窗口函數(shù)對于每個組返回多行,組內(nèi)每一行對應(yīng)返回一行值。
聚合函數(shù)對于每個組只返回一行。
2、在日常的開發(fā)中常用那些類型函數(shù)?
- 1、字符串操作函數(shù)?split、concat、ifnull、substr、substring、cast
- 2、聚合函數(shù) : hive適用于分析,所以常用
- 3、時間函數(shù) : 數(shù)倉的特征隨時間變化而變化,所以時間也特別多
- 4、窗口函數(shù):sum() over() 、 count() over() 、 排名函數(shù)
接下來,著重講解hive的窗口函數(shù)。
3、了解哪些窗口函數(shù),都是什么意思?
窗口函數(shù)功能
- sum(col) over() : 分組對col累計求和
- count(col) over() : 分組對col累計
- min(col) over() : 分組對col求最小值
- max(col) over() : 分組求col的最大值
- avg(col) over() : 分組求col列的平均值
- first_value(col) over() : 某分組排序后的第一個col值
- last_value(col) over() : 某分組排序后的最后一個col值
- lag(col,n,DEFAULT) : 統(tǒng)計往前n行的col值,n可選,默認為1,DEFAULT當往上第n行為NULL時候,取默認值,如不指定,則為NULL
- lead(col,n,DEFAULT) : 統(tǒng)計往后n行的col值,n可選,默認為1,DEFAULT當往下第n行為NULL時候,取默認值,如不指定,則為NULL
- ntile(n) : 用于將分組數(shù)據(jù)按照順序切分成n片,返回當前切片值。注意:n必須為int類型
- row_number() over() : 排名函數(shù),不會重復(fù),適合于生成主鍵或者不并列排名
- rank() over() : 排名函數(shù),有并列名次,名次不連續(xù)。如:1,1,3
- dense_rank() over() : 排名函數(shù),有并列名次,名次連續(xù)。如:1,1,2
over語法
over(分組 排序 窗口) 中的order by后的語法:
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING) (ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING) (ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING1、物理窗口(真實往上下移動多少行rows between):
CURRENT ROW | UNBOUNDED PRECEDING | [num] PRECEDING AND UNBOUNDED FOLLOWING | [num] FOLLOWING| CURRENT ROW如: over(partition by col order by 排序字段 rows between 1 preceding and 1 fllowing)
2、 邏輯窗口(滿足條件上下多少行):(金融行業(yè)、保險行業(yè)、p2p等)
range between [num] PRECEDING AND [num] FOLLOWING如: over(partition by col order by 排序字段 range between 5 preceding and 5 fllowing)
注意: 窗口函數(shù)一般不和group by搭配使用。
4、窗口聚合函數(shù)
創(chuàng)建測試表,存放當天每半小時的店鋪銷售數(shù)據(jù)
CREATE TABLE IF NOT EXISTS shop_data(shop_id INT comment '店鋪id', stat_date STRING comment '時間', ordamt DOUBLE comment '銷售額' ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE;插入數(shù)據(jù):
insert into shop_data values (10026,'201901230030',5170), (10026,'201901230100',5669), (10026,'201901230130',2396), (10026,'201901230200',1498), (10026,'201901230230',1997), (10026,'201901230300',1188), (10026,'201901230330',598), (10026,'201901230400',479), (10026,'201901230430',1587), (10026,'201901230530',799), (10027,'201901230030',2170), (10027,'201901230100',1623), (10027,'201901230130',3397), (10027,'201901230200',1434), (10027,'201901230230',1001), (10028,'201901230300',1687), (10028,'201901230330',1298), (10028,'201901230400',149), (10029,'201901230430',2587), (10029,'201901230530',589);count開窗函數(shù)
select shop_id,stat_date,ordamt, -- 以符合條件的所有行作為窗口 count(shop_id) over() as count1,-- 以按shop_id分組的所有行作為窗口 count(shop_id) over(partition by shop_id) as count2,-- 以按shop_id分組、按stat_date排序的所有行作為窗口 count(shop_id) over(partition by shop_id order by stat_date) as count3,-- 以按shop_id分組、按stat_date排序、按當前行+往前1行+往后2行的行作為窗口 count(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as count4,-- 以按shop_id分組、按stat_date排序、按從起點到末尾,默認從起點到末尾和count2結(jié)果相同 count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as count5,-- 以按shop_id分組、按stat_date排序、按從起點到當前行的前一行 count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as count6,-- 以按shop_id分組、按stat_date排序、按從起點到當前行 count(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as count7,-- 以按shop_id分組、按stat_date排序、按從當前行到末尾 count(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as count8,-- 以按shop_id分組、按stat_date排序、按從當前行往后一行到末尾 count(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as count9,-- 以按shop_id分組、按stat_date排序、按從當前行往后一行到當前行往后2行 count(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as count10 from shop_data;OK shop_id stat_date ordamt count1 count2 count3 count4 count5 count6 count7 count8 count9 count10 10026 201901230030 5170.0 20 10 1 3 10 0 1 10 9 2 10026 201901230100 5669.0 20 10 2 4 10 1 2 9 8 2 10026 201901230130 2396.0 20 10 3 4 10 2 3 8 7 2 10026 201901230200 1498.0 20 10 4 4 10 3 4 7 6 2 10026 201901230230 1997.0 20 10 5 4 10 4 5 6 5 2 10026 201901230300 1188.0 20 10 6 4 10 5 6 5 4 2 10026 201901230330 598.0 20 10 7 4 10 6 7 4 3 2 10026 201901230400 479.0 20 10 8 4 10 7 8 3 2 2 10026 201901230430 1587.0 20 10 9 3 10 8 9 2 1 1 10026 201901230530 799.0 20 10 10 2 10 9 10 1 0 0 10027 201901230030 2170.0 20 5 1 3 5 0 1 5 4 2 10027 201901230100 1623.0 20 5 2 4 5 1 2 4 3 2 10027 201901230130 3397.0 20 5 3 4 5 2 3 3 2 2 10027 201901230200 1434.0 20 5 4 3 5 3 4 2 1 1 10027 201901230230 1001.0 20 5 5 2 5 4 5 1 0 0 10028 201901230300 1687.0 20 3 1 3 3 0 1 3 2 2 10028 201901230330 1298.0 20 3 2 3 3 1 2 2 1 1 10028 201901230400 149.0 20 3 3 2 3 2 3 1 0 0 10029 201901230430 2587.0 20 2 1 2 2 0 1 2 1 1 10029 201901230530 589.0 20 2 2 2 2 1 2 1 0 0sum開窗函數(shù)
select shop_id, stat_date, ordamt,-- 以按shop_id分組、按stat_date排序、統(tǒng)計每個商品截止到當前時間的銷售總額,默認從起點到當前行 sum(ordamt) over(partition by shop_id order by stat_date) as sum_amt1,-- 以按shop_id分組、按stat_date排序、統(tǒng)計每個商品前半小時到后一小時的銷售額(按當前行+往前1行+往后2行的行作為窗口) sum(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as sum_amt2,-- 以按shop_id分組、按stat_date排序、統(tǒng)計每個商品的銷售總額(從起點到末尾) sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as sum_amt3,-- 以按shop_id分組、按stat_date排序、統(tǒng)計截止到前半小時的銷售總額(從起點到當前行的前一行) sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as sum_amt4,-- 以按shop_id分組、按stat_date排序、統(tǒng)計每個商品截止到當前時間的銷售總額,默認從起點到當前行(從起點到當前行) sum(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as sum_amt5,-- 以按shop_id分組、按stat_date排序、統(tǒng)計當前時間及之后的銷售總額(從當前行的末尾) sum(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as sum_amt6,-- 以按shop_id分組、按stat_date排序、統(tǒng)計當前時間的后半小時及之后的銷售額(當前行后一行到末尾) sum(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as sum_amt7,-- 以按shop_id分組、按stat_date排序、統(tǒng)計當前時間后半小時到后一小時之間的銷售額(按從當前行往后一行到當前行往后2行) sum(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as sum_amt8 from shop_data;OK shop_id stat_date ordamt sum_amt1 sum_amt2 sum_amt3 sum_amt4 sum_amt5 sum_amt6 sum_amt7 sum_amt8 10026 201901230030 5170.0 5170.0 13235.0 21381.0 NULL 5170.0 21381.0 16211.0 8065.0 10026 201901230100 5669.0 10839.0 14733.0 21381.0 5170.0 10839.0 16211.0 10542.0 3894.0 10026 201901230130 2396.0 13235.0 11560.0 21381.0 10839.0 13235.0 10542.0 8146.0 3495.0 10026 201901230200 1498.0 14733.0 7079.0 21381.0 13235.0 14733.0 8146.0 6648.0 3185.0 10026 201901230230 1997.0 16730.0 5281.0 21381.0 14733.0 16730.0 6648.0 4651.0 1786.0 10026 201901230300 1188.0 17918.0 4262.0 21381.0 16730.0 17918.0 4651.0 3463.0 1077.0 10026 201901230330 598.0 18516.0 3852.0 21381.0 17918.0 18516.0 3463.0 2865.0 2066.0 10026 201901230400 479.0 18995.0 3463.0 21381.0 18516.0 18995.0 2865.0 2386.0 2386.0 10026 201901230430 1587.0 20582.0 2865.0 21381.0 18995.0 20582.0 2386.0 799.0 799.0 10026 201901230530 799.0 21381.0 2386.0 21381.0 20582.0 21381.0 799.0 NULL NULL 10027 201901230030 2170.0 2170.0 7190.0 9625.0 NULL 2170.0 9625.0 7455.0 5020.0 10027 201901230100 1623.0 3793.0 8624.0 9625.0 2170.0 3793.0 7455.0 5832.0 4831.0 10027 201901230130 3397.0 7190.0 7455.0 9625.0 3793.0 7190.0 5832.0 2435.0 2435.0 10027 201901230200 1434.0 8624.0 5832.0 9625.0 7190.0 8624.0 2435.0 1001.0 1001.0 10027 201901230230 1001.0 9625.0 2435.0 9625.0 8624.0 9625.0 1001.0 NULL NULL 10028 201901230300 1687.0 1687.0 3134.0 3134.0 NULL 1687.0 3134.0 1447.0 1447.0 10028 201901230330 1298.0 2985.0 3134.0 3134.0 1687.0 2985.0 1447.0 149.0 149.0 10028 201901230400 149.0 3134.0 1447.0 3134.0 2985.0 3134.0 149.0 NULL NULL 10029 201901230430 2587.0 2587.0 3176.0 3176.0 NULL 2587.0 3176.0 589.0 589.0 10029 201901230530 589.0 3176.0 3176.0 3176.0 2587.0 3176.0 589.0 NULL NULLavg開窗函數(shù)
select shop_id, stat_date, ordamt,-- 以按shop_id分組、按stat_date排序、 round(avg(ordamt) over(partition by shop_id order by stat_date),2) as avg_amt1,-- 以按shop_id分組、按stat_date排序、按當前行+往前1行+往后2行的行作為窗口的平均值 round(avg(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following), 2) as avg_amt2,-- 以按shop_id分組、按stat_date排序、從起點到末尾的平均值 round(avg(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following), 2) as avg_amt3,-- 以按shop_id分組、按stat_date排序、從起點到當前行的前一行的平均值 round(avg(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding), 2) as avg_amt4,-- 以按shop_id分組、按stat_date排序、從起點到當前行的平均值 round(avg(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row), 2) as avg_amt5,-- 以按shop_id分組、按stat_date排序、從當前行的末尾的平均值 round(avg(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following), 2) as avg_amt6,-- 以按shop_id分組、按stat_date排序、當前行后一行到末尾的平均值 round(avg(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following), 2) as avg_amt7,-- 以按shop_id分組、按stat_date排序、按從當前行往后一行到當前行往后2行的平均值 round(avg(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following), 2) as avg_amt8 from shop_data;OK shop_id stat_date ordamt avg_amt1 avg_amt2 avg_amt3 avg_amt4 avg_amt5 avg_amt6 avg_amt7 avg_amt8 10026 201901230030 5170.0 5170.0 4411.67 2138.1 NULL 5170.0 2138.1 1801.22 4032.5 10026 201901230100 5669.0 5419.5 3683.25 2138.1 5170.0 5419.5 1801.22 1317.75 1947.0 10026 201901230130 2396.0 4411.67 2890.0 2138.1 5419.5 4411.67 1317.75 1163.71 1747.5 10026 201901230200 1498.0 3683.25 1769.75 2138.1 4411.67 3683.25 1163.71 1108.0 1592.5 10026 201901230230 1997.0 3346.0 1320.25 2138.1 3683.25 3346.0 1108.0 930.2 893.0 10026 201901230300 1188.0 2986.33 1065.5 2138.1 3346.0 2986.33 930.2 865.75 538.5 10026 201901230330 598.0 2645.14 963.0 2138.1 2986.33 2645.14 865.75 955.0 1033.0 10026 201901230400 479.0 2374.38 865.75 2138.1 2645.14 2374.38 955.0 1193.0 1193.0 10026 201901230430 1587.0 2286.89 955.0 2138.1 2374.38 2286.89 1193.0 799.0 799.0 10026 201901230530 799.0 2138.1 1193.0 2138.1 2286.89 2138.1 799.0 NULL NULL 10027 201901230030 2170.0 2170.0 2396.67 1925.0 NULL 2170.0 1925.0 1863.75 2510.0 10027 201901230100 1623.0 1896.5 2156.0 1925.0 2170.0 1896.5 1863.75 1944.0 2415.5 10027 201901230130 3397.0 2396.67 1863.75 1925.0 1896.5 2396.67 1944.0 1217.5 1217.5 10027 201901230200 1434.0 2156.0 1944.0 1925.0 2396.67 2156.0 1217.5 1001.0 1001.0 10027 201901230230 1001.0 1925.0 1217.5 1925.0 2156.0 1925.0 1001.0 NULL NULL 10028 201901230300 1687.0 1687.0 1044.67 1044.67 NULL 1687.0 1044.67 723.5 723.5 10028 201901230330 1298.0 1492.5 1044.67 1044.67 1687.0 1492.5 723.5 149.0 149.0 10028 201901230400 149.0 1044.67 723.5 1044.67 1492.5 1044.67 149.0 NULL NULL 10029 201901230430 2587.0 2587.0 1588.0 1588.0 NULL 2587.0 1588.0 589.0 589.0 10029 201901230530 589.0 1588.0 1588.0 1588.0 2587.0 1588.0 589.0 NULL NULLmin開窗函數(shù)
select shop_id, stat_date, ordamt,-- 以按shop_id分組、按stat_date排序、 min(ordamt) over(partition by shop_id order by stat_date) as min_amt1,-- 以按shop_id分組、按stat_date排序、按當前行+往前1行+往后2行的行作為窗口的最小數(shù) min(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as min_amt2,-- 以按shop_id分組、按stat_date排序、從起點到末尾的最小數(shù) min(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as min_amt3,-- 以按shop_id分組、按stat_date排序、從起點到當前行的前一行的最小數(shù) min(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as min_amt4,-- 以按shop_id分組、按stat_date排序、從起點到當前行的最小數(shù) min(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as min_amt5,-- 以按shop_id分組、按stat_date排序、從當前行的末尾的最小數(shù) min(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as min_amt6,-- 以按shop_id分組、按stat_date排序、當前行后一行到末尾的最小數(shù) min(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as min_amt7,-- 以按shop_id分組、按stat_date排序、按從當前行往后一行到當前行往后2行的最小數(shù) min(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as min_amt8 from shop_data;OK shop_id stat_date ordamt min_amt1 min_amt2 min_amt3 min_amt4 min_amt5 min_amt6 min_amt7 min_amt8 10026 201901230030 5170.0 5170.0 2396.0 479.0 NULL 5170.0 479.0 479.0 2396.0 10026 201901230100 5669.0 5170.0 1498.0 479.0 5170.0 5170.0 479.0 479.0 1498.0 10026 201901230130 2396.0 2396.0 1498.0 479.0 5170.0 2396.0 479.0 479.0 1498.0 10026 201901230200 1498.0 1498.0 1188.0 479.0 2396.0 1498.0 479.0 479.0 1188.0 10026 201901230230 1997.0 1498.0 598.0 479.0 1498.0 1498.0 479.0 479.0 598.0 10026 201901230300 1188.0 1188.0 479.0 479.0 1498.0 1188.0 479.0 479.0 479.0 10026 201901230330 598.0 598.0 479.0 479.0 1188.0 598.0 479.0 479.0 479.0 10026 201901230400 479.0 479.0 479.0 479.0 598.0 479.0 479.0 799.0 799.0 10026 201901230430 1587.0 479.0 479.0 479.0 479.0 479.0 799.0 799.0 799.0 10026 201901230530 799.0 479.0 799.0 479.0 479.0 479.0 799.0 NULL NULL 10027 201901230030 2170.0 2170.0 1623.0 1001.0 NULL 2170.0 1001.0 1001.0 1623.0 10027 201901230100 1623.0 1623.0 1434.0 1001.0 2170.0 1623.0 1001.0 1001.0 1434.0 10027 201901230130 3397.0 1623.0 1001.0 1001.0 1623.0 1623.0 1001.0 1001.0 1001.0 10027 201901230200 1434.0 1434.0 1001.0 1001.0 1623.0 1434.0 1001.0 1001.0 1001.0 10027 201901230230 1001.0 1001.0 1001.0 1001.0 1434.0 1001.0 1001.0 NULL NULL 10028 201901230300 1687.0 1687.0 149.0 149.0 NULL 1687.0 149.0 149.0 149.0 10028 201901230330 1298.0 1298.0 149.0 149.0 1687.0 1298.0 149.0 149.0 149.0 10028 201901230400 149.0 149.0 149.0 149.0 1298.0 149.0 149.0 NULL NULL 10029 201901230430 2587.0 2587.0 589.0 589.0 NULL 2587.0 589.0 589.0 589.0 10029 201901230530 589.0 589.0 589.0 589.0 2587.0 589.0 589.0 NULL NULLmax開窗函數(shù)
select shop_id, stat_date, ordamt,-- 以按shop_id分組、按stat_date排序、 max(ordamt) over(partition by shop_id order by stat_date) as max_amt1,-- 以按shop_id分組、按stat_date排序、按當前行+往前1行+往后2行的行作為窗口的最大數(shù) max(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as max_amt2,-- 以按shop_id分組、按stat_date排序、從起點到末尾的最大數(shù) max(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as max_amt3,-- 以按shop_id分組、按stat_date排序、從起點到當前行的前一行的最大數(shù) max(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as max_amt4,-- 以按shop_id分組、按stat_date排序、從起點到當前行的最大數(shù) max(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as max_amt5,-- 以按shop_id分組、按stat_date排序、從當前行的末尾的最大數(shù) max(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as max_amt6,-- 以按shop_id分組、按stat_date排序、當前行后一行到末尾的最大數(shù) max(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as max_amt7,-- 以按shop_id分組、按stat_date排序、按從當前行往后一行到當前行往后2行的最大數(shù) max(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as max_amt8 from shop_data;OK shop_id stat_date ordamt max_amt1 max_amt2 max_amt3 max_amt4 max_amt5 max_amt6 max_amt7 max_amt8 10026 201901230030 5170.0 5170.0 5669.0 5669.0 NULL 5170.0 5669.0 5669.0 5669.0 10026 201901230100 5669.0 5669.0 5669.0 5669.0 5170.0 5669.0 5669.0 2396.0 2396.0 10026 201901230130 2396.0 5669.0 5669.0 5669.0 5669.0 5669.0 2396.0 1997.0 1997.0 10026 201901230200 1498.0 5669.0 2396.0 5669.0 5669.0 5669.0 1997.0 1997.0 1997.0 10026 201901230230 1997.0 5669.0 1997.0 5669.0 5669.0 5669.0 1997.0 1587.0 1188.0 10026 201901230300 1188.0 5669.0 1997.0 5669.0 5669.0 5669.0 1587.0 1587.0 598.0 10026 201901230330 598.0 5669.0 1587.0 5669.0 5669.0 5669.0 1587.0 1587.0 1587.0 10026 201901230400 479.0 5669.0 1587.0 5669.0 5669.0 5669.0 1587.0 1587.0 1587.0 10026 201901230430 1587.0 5669.0 1587.0 5669.0 5669.0 5669.0 1587.0 799.0 799.0 10026 201901230530 799.0 5669.0 1587.0 5669.0 5669.0 5669.0 799.0 NULL NULL 10027 201901230030 2170.0 2170.0 3397.0 3397.0 NULL 2170.0 3397.0 3397.0 3397.0 10027 201901230100 1623.0 2170.0 3397.0 3397.0 2170.0 2170.0 3397.0 3397.0 3397.0 10027 201901230130 3397.0 3397.0 3397.0 3397.0 2170.0 3397.0 3397.0 1434.0 1434.0 10027 201901230200 1434.0 3397.0 3397.0 3397.0 3397.0 3397.0 1434.0 1001.0 1001.0 10027 201901230230 1001.0 3397.0 1434.0 3397.0 3397.0 3397.0 1001.0 NULL NULL 10028 201901230300 1687.0 1687.0 1687.0 1687.0 NULL 1687.0 1687.0 1298.0 1298.0 10028 201901230330 1298.0 1687.0 1687.0 1687.0 1687.0 1687.0 1298.0 149.0 149.0 10028 201901230400 149.0 1687.0 1298.0 1687.0 1687.0 1687.0 149.0 NULL NULL 10029 201901230430 2587.0 2587.0 2587.0 2587.0 NULL 2587.0 2587.0 589.0 589.0 10029 201901230530 589.0 2587.0 2587.0 2587.0 2587.0 2587.0 589.0 NULL NULL5、窗口分析函數(shù)
first_value開窗函數(shù)
select shop_id, stat_date, ordamt,-- 以按shop_id分組、按stat_date排序、 first_value(ordamt) over(partition by shop_id order by stat_date) as first_amt1,-- 以按shop_id分組、按stat_date排序、按當前行+往前1行+往后2行的行作為窗口的第一個值 first_value(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as first_amt2,-- 以按shop_id分組、按stat_date排序、從起點到末尾的第一個值 first_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as first_amt3,-- 以按shop_id分組、按stat_date排序、從起點到當前行的前一行的第一個值 first_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as first_amt4,-- 以按shop_id分組、按stat_date排序、從起點到當前行的第一個值 first_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as first_amt5,-- 以按shop_id分組、按stat_date排序、從當前行的末尾的第一個值 first_value(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as first_amt6,-- 以按shop_id分組、按stat_date排序、當前行后一行到末尾的第一個值 first_value(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as first_amt7,-- 以按shop_id分組、按stat_date排序、按從當前行往后一行到當前行往后2行的第一個值 first_value(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as first_amt8 from shop_data;OK shop_id stat_date ordamt first_amt1 first_amt2 first_amt3 first_amt4 first_amt5 first_amt6 first_amt7 first_amt8 10026 201901230030 5170.0 5170.0 5170.0 5170.0 NULL 5170.0 5170.0 5669.0 5669.0 10026 201901230100 5669.0 5170.0 5170.0 5170.0 5170.0 5170.0 5669.0 2396.0 2396.0 10026 201901230130 2396.0 5170.0 5669.0 5170.0 5170.0 5170.0 2396.0 1498.0 1498.0 10026 201901230200 1498.0 5170.0 2396.0 5170.0 5170.0 5170.0 1498.0 1997.0 1997.0 10026 201901230230 1997.0 5170.0 1498.0 5170.0 5170.0 5170.0 1997.0 1188.0 1188.0 10026 201901230300 1188.0 5170.0 1997.0 5170.0 5170.0 5170.0 1188.0 598.0 598.0 10026 201901230330 598.0 5170.0 1188.0 5170.0 5170.0 5170.0 598.0 479.0 479.0 10026 201901230400 479.0 5170.0 598.0 5170.0 5170.0 5170.0 479.0 1587.0 1587.0 10026 201901230430 1587.0 5170.0 479.0 5170.0 5170.0 5170.0 1587.0 799.0 799.0 10026 201901230530 799.0 5170.0 1587.0 5170.0 5170.0 5170.0 799.0 NULL NULL 10027 201901230030 2170.0 2170.0 2170.0 2170.0 NULL 2170.0 2170.0 1623.0 1623.0 10027 201901230100 1623.0 2170.0 2170.0 2170.0 2170.0 2170.0 1623.0 3397.0 3397.0 10027 201901230130 3397.0 2170.0 1623.0 2170.0 2170.0 2170.0 3397.0 1434.0 1434.0 10027 201901230200 1434.0 2170.0 3397.0 2170.0 2170.0 2170.0 1434.0 1001.0 1001.0 10027 201901230230 1001.0 2170.0 1434.0 2170.0 2170.0 2170.0 1001.0 NULL NULL 10028 201901230300 1687.0 1687.0 1687.0 1687.0 NULL 1687.0 1687.0 1298.0 1298.0 10028 201901230330 1298.0 1687.0 1687.0 1687.0 1687.0 1687.0 1298.0 149.0 149.0 10028 201901230400 149.0 1687.0 1298.0 1687.0 1687.0 1687.0 149.0 NULL NULL 10029 201901230430 2587.0 2587.0 2587.0 2587.0 NULL 2587.0 2587.0 589.0 589.0 10029 201901230530 589.0 2587.0 2587.0 2587.0 2587.0 2587.0 589.0 NULL NULLlast_value開窗函數(shù)
select shop_id, stat_date, ordamt,-- 以按shop_id分組、按stat_date排序、 last_value(ordamt) over(partition by shop_id order by stat_date) as last_amt1,-- 以按shop_id分組、按stat_date排序、按當前行+往前1行+往后2行的行作為窗口的最后一個值 last_value(ordamt) over(partition by shop_id order by stat_date rows between 1 preceding and 2 following) as last_amt2,-- 以按shop_id分組、按stat_date排序、從起點到末尾的最后一個值 last_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as last_amt3,-- 以按shop_id分組、按stat_date排序、從起點到當前行的前一行的最后一個值 last_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and 1 preceding) as last_amt4,-- 以按shop_id分組、按stat_date排序、從起點到當前行的最后一個值 last_value(ordamt) over(partition by shop_id order by stat_date rows between unbounded preceding and current row) as last_amt5,-- 以按shop_id分組、按stat_date排序、從當前行的末尾的最后一個值 last_value(ordamt) over(partition by shop_id order by stat_date rows between current row and unbounded following) as last_amt6,-- 以按shop_id分組、按stat_date排序、當前行后一行到末尾的最后一個值 last_value(ordamt) over(partition by shop_id order by stat_date rows between 1 following and unbounded following) as last_amt7,-- 以按shop_id分組、按stat_date排序、按從當前行往后一行到當前行往后2行的最后一個值 last_value(ordamt) over(partition by shop_id order by stat_date rows between 1 following and 2 following) as last_amt8 from shop_data;OK shop_id stat_date ordamt last_amt1 last_amt2 last_amt3 last_amt4 last_amt5 last_amt6 last_amt7 last_amt8 10026 201901230030 5170.0 5170.0 2396.0 799.0 NULL 5170.0 799.0 799.0 2396.0 10026 201901230100 5669.0 5669.0 1498.0 799.0 5170.0 5669.0 799.0 799.0 1498.0 10026 201901230130 2396.0 2396.0 1997.0 799.0 5669.0 2396.0 799.0 799.0 1997.0 10026 201901230200 1498.0 1498.0 1188.0 799.0 2396.0 1498.0 799.0 799.0 1188.0 10026 201901230230 1997.0 1997.0 598.0 799.0 1498.0 1997.0 799.0 799.0 598.0 10026 201901230300 1188.0 1188.0 479.0 799.0 1997.0 1188.0 799.0 799.0 479.0 10026 201901230330 598.0 598.0 1587.0 799.0 1188.0 598.0 799.0 799.0 1587.0 10026 201901230400 479.0 479.0 799.0 799.0 598.0 479.0 799.0 799.0 799.0 10026 201901230430 1587.0 1587.0 799.0 799.0 479.0 1587.0 799.0 799.0 799.0 10026 201901230530 799.0 799.0 799.0 799.0 1587.0 799.0 799.0 NULL NULL 10027 201901230030 2170.0 2170.0 3397.0 1001.0 NULL 2170.0 1001.0 1001.0 3397.0 10027 201901230100 1623.0 1623.0 1434.0 1001.0 2170.0 1623.0 1001.0 1001.0 1434.0 10027 201901230130 3397.0 3397.0 1001.0 1001.0 1623.0 3397.0 1001.0 1001.0 1001.0 10027 201901230200 1434.0 1434.0 1001.0 1001.0 3397.0 1434.0 1001.0 1001.0 1001.0 10027 201901230230 1001.0 1001.0 1001.0 1001.0 1434.0 1001.0 1001.0 NULL NULL 10028 201901230300 1687.0 1687.0 149.0 149.0 NULL 1687.0 149.0 149.0 149.0 10028 201901230330 1298.0 1298.0 149.0 149.0 1687.0 1298.0 149.0 149.0 149.0 10028 201901230400 149.0 149.0 149.0 149.0 1298.0 149.0 149.0 NULL NULL 10029 201901230430 2587.0 2587.0 589.0 589.0 NULL 2587.0 589.0 589.0 589.0 10029 201901230530 589.0 589.0 589.0 589.0 2587.0 589.0 589.0 NULL NULLlag開窗函數(shù)
注意:
1、使用第三個參數(shù)設(shè)置默認值時,默認值的數(shù)據(jù)類型需要和列的數(shù)據(jù)類型保持一致,否則設(shè)置不生效。
2、使用lag() 和 lead() 不能對窗口限定邊界,必須是 unbounded 無界的,如果設(shè)置了邊界,會出現(xiàn)如下報錯信息。
lead開窗函數(shù)
select shop_id, stat_date, ordamt, -- 以按shop_id分組、按stat_date排序、注意第三個參數(shù)默認值的類型需要和列類型匹配,否則不生效 lead(ordamt, 1, 0) over(partition by shop_id order by stat_date) as last_amt1, lead(ordamt, 2, 'NA') over(partition by shop_id order by stat_date rows between unbounded preceding and unbounded following) as last_amt2 from shop_data;OK shop_id stat_date ordamt last_amt1 last_amt2 10026 201901230030 5170 5669 2396 10026 201901230100 5669 2396 1498 10026 201901230130 2396 1498 1997 10026 201901230200 1498 1997 1188 10026 201901230230 1997 1188 598 10026 201901230300 1188 598 479 10026 201901230330 598 479 1587 10026 201901230400 479 1587 799 10026 201901230430 1587 799 0 10026 201901230530 799 0 0 10027 201901230030 2170 1623 3397 10027 201901230100 1623 3397 1434 10027 201901230130 3397 1434 1001 10027 201901230200 1434 1001 0 10027 201901230230 1001 0 0 10028 201901230300 1687 1298 149 10028 201901230330 1298 149 0 10028 201901230400 149 0 0 10029 201901230430 2587 589 0 10029 201901230530 589 0 0cume_dist開窗函數(shù)
計算某個窗口或分區(qū)中某個值的累積分布。假定升序排序,則使用以下公式確定累積分布:
小于等于當前值x的行數(shù) / 窗口或partition分區(qū)內(nèi)的總行數(shù)。其中,x 等于 order by 子句中指定的列的當前行中的值。
6、窗口排序函數(shù)
rank開窗函數(shù)
rank 開窗函數(shù)基于 over 子句中的 order by 確定一組值中一個值的排名。
如果存在partition by ,則為每個分區(qū)組中的每個值排名。排名可能不是連續(xù)的。例如,如果兩個行的排名為 1,則下一個排名為 3。
dense_rank開窗函數(shù)
dense_rank與rank有一點不同,當排名一樣的時候,接下來的行是連續(xù)的。如兩個行的排名為 1,則下一個排名為 2。
select shop_id, stat_date, ordamt,-- 以按shop_id排序 dense_rank() over(order by shop_id) as dense_amt1,-- 以按shop_id分區(qū)、按stat_date排序 dense_rank() over(partition by shop_id order by stat_date) as dense_amt2 from shop_data;OK shop_id stat_date ordamt dense_amt1 dense_amt2 10026 201901230030 5170.0 1 1 10026 201901230100 5669.0 1 2 10026 201901230130 2396.0 1 3 10026 201901230200 1498.0 1 4 10026 201901230230 1997.0 1 5 10026 201901230300 1188.0 1 6 10026 201901230330 598.0 1 7 10026 201901230400 479.0 1 8 10026 201901230430 1587.0 1 9 10026 201901230530 799.0 1 10 10027 201901230030 2170.0 2 1 10027 201901230100 1623.0 2 2 10027 201901230130 3397.0 2 3 10027 201901230200 1434.0 2 4 10027 201901230230 1001.0 2 5 10028 201901230300 1687.0 3 1 10028 201901230330 1298.0 3 2 10028 201901230400 149.0 3 3 10029 201901230430 2587.0 4 1 10029 201901230530 589.0 4 2ntile開窗函數(shù)
將分區(qū)中已排序的行劃分為大小盡可能相等的指定數(shù)量的排名的組,并返回給定行所在的組的排名。
select shop_id, stat_date, ordamt,-- 以按shop_id分成兩組、按stat_date排序 ntile(2) over(partition by shop_id order by stat_date) as ntile_amt1,-- 以按shop_id分成三組、按stat_date排序 ntile(3) over(partition by shop_id order by stat_date) as ntile_amt2 from shop_data;OK shop_id stat_date ordamt ntile_amt1 ntile_amt2 10026 201901230030 5170.0 1 1 10026 201901230100 5669.0 1 1 10026 201901230130 2396.0 1 1 10026 201901230200 1498.0 1 1 10026 201901230230 1997.0 1 2 10026 201901230300 1188.0 2 2 10026 201901230330 598.0 2 2 10026 201901230400 479.0 2 3 10026 201901230430 1587.0 2 3 10026 201901230530 799.0 2 3 10027 201901230030 2170.0 1 1 10027 201901230100 1623.0 1 1 10027 201901230130 3397.0 1 2 10027 201901230200 1434.0 2 2 10027 201901230230 1001.0 2 3 10028 201901230300 1687.0 1 1 10028 201901230330 1298.0 1 2 10028 201901230400 149.0 2 3 10029 201901230430 2587.0 1 1 10029 201901230530 589.0 2 2row_number開窗函數(shù)
從1開始對分區(qū)內(nèi)的數(shù)據(jù)排序。
select shop_id, stat_date, ordamt,-- 以按shop_id分區(qū)、按stat_date排序 row_number() over(partition by shop_id order by stat_date) as row_amt from shop_data;OK shop_id stat_date ordamt row_amt 10026 201901230030 5170.0 1 10026 201901230100 5669.0 2 10026 201901230130 2396.0 3 10026 201901230200 1498.0 4 10026 201901230230 1997.0 5 10026 201901230300 1188.0 6 10026 201901230330 598.0 7 10026 201901230400 479.0 8 10026 201901230430 1587.0 9 10026 201901230530 799.0 10 10027 201901230030 2170.0 1 10027 201901230100 1623.0 2 10027 201901230130 3397.0 3 10027 201901230200 1434.0 4 10027 201901230230 1001.0 5 10028 201901230300 1687.0 1 10028 201901230330 1298.0 2 10028 201901230400 149.0 3 10029 201901230430 2587.0 1 10029 201901230530 589.0 2percent_rank開窗函數(shù)
計算給定行的百分比排名。可以用來計算超過了百分之多少的人。如360小助手開機速度超過了百分之多少的人。
(當前行的rank值-1)/(分組內(nèi)的總行數(shù)-1)
總結(jié)
以上是生活随笔為你收集整理的Hive 窗口函数详解的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 学习iOS从object-c开始-语法(
- 下一篇: DM8 Out of space,错误码