hive 如何将数组转成字符串_hive详细笔记(八)-Hive之列转行和行转列(附带讲解视频 )...
1 行轉列
1.1 函數
CONCAT(string A/col, string B/col…):返回輸入字符串連接后的結果,支持任意個輸入字符串;
CONCAT_WS(separator, str1, str2,...):它是一個特殊形式的 CONCAT()。第一個參數剩余參數間的分隔符。分隔符可以是與剩余參數一樣的字符串。如果分隔符是 NULL,返回值也將為 NULL。這個函數會跳過分隔符參數后的任何 NULL 和空字符串。分隔符將被加到被連接的字符串之間;
COLLECT_SET(col):函數只接受基本數據類型,它的主要作用是將某字段的值進行去重匯總,產生array類型字段。?將某列數據轉換成數組
1.1.1 concat字符串的拼接
0: jdbc:hive2://linux01:10000> desc fromatted concat ;
FAILED: SemanticException [Error 10001]: Table not found fromatted
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Table not found fromatted (state=42S02,code=10001)
0: jdbc:hive2://linux01:10000> desc formatted concat ;
FAILED: SemanticException [Error 10001]: Table not found concat
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Table not found concat (state=42S02,code=10001)
0: jdbc:hive2://linux01:10000> desc function concat ;
OK
+----------------------------------------------------+
| tab_name |
+----------------------------------------------------+
| concat(str1, str2, ... strN) - returns the concatenation of str1, str2, ... strN or concat(bin1, bin2, ... binN) - returns the concatenation of bytes in binary data bin1, bin2, ... binN |
+----------------------------------------------------+
示例? 實現字符串的拼接
0: jdbc:hive2://linux01:10000> select concat("a" , "-->","b","-->","c")
. . . . . . . . . . . . . . .> ;
OK
+------------+
| _c0 |
+------------+
| a-->b-->c |
+------------+
concat的執行時機時行數據? 將查詢的表中的一行中的多個字段拼接
0: jdbc:hive2://linux01:10000> select concat(ename,":",job,":",sal) from tb_emp;
OK
+--------------------------+
| _c0 |
+--------------------------+
| SMITH:CLERK:800.0 |
| ALLEN:SALESMAN:1600.0 |
| WARD:SALESMAN:1250.0 |
| JONES:MANAGER:2975.0 |
| MARTIN:SALESMAN:1250.0 |
| BLAKE:MANAGER:2850.0 |
| CLARK:MANAGER:2450.0 |
| SCOTT:ANALYST:3000.0 |
| KING:PRESIDENT:5000.0 |
| TURNER:SALESMAN:1500.0 |
| ADAMS:CLERK:1100.0 |
| JAMES:CLERK:950.0 |
| FORD:ANALYST:3000.0 |
| MILLER:CLERK:1300.0 |
| HUGUANYU:HANGGE:18000.0 |
+--------------------------+
1.1.2?CONCAT_WS(separator, str1, str2,...)
concat_ws 比 concat 可以自定字段的分隔符
concat_ws (參數一(分隔符) , str1 , str2....)
concat_ws (參數一(分隔符) , 數組)
示例
0: jdbc:hive2://linux01:10000> select concat_ws("_" , "tom","cat" ,"jim" ,"jerry") ;
OK
+--------------------+
| _c0 |
+--------------------+
| tom_cat_jim_jerry |
+--------------------+
0: jdbc:hive2://linux01:10000> select concat_ws(":" , ename ,job , sal) from tb_emp ;
FAILED: SemanticException [Error 10016]: Line 1:36 Argument type mismatch 'sal': Argument 4 of function CONCAT_WS must be "string or array", but "double" was found.
Error: Error while compiling statement: FAILED: SemanticException [Error 10016]: Line 1:36 Argument type mismatch 'sal': Argument 4 of function CONCAT_WS must be "string or array", but "double" was found. (state=42000,code=10016)
需要類型轉換 將double轉換成string? 語法
cast(變量? AS 數據類型) 強制類型轉換
cast(sal as string)
select concat_ws(":" , ename ,job , cast(sal as string)) from tb_emp ;
OK
+--------------------------+
| _c0 |
+--------------------------+
| SMITH:CLERK:800.0 |
| ALLEN:SALESMAN:1600.0 |
| WARD:SALESMAN:1250.0 |
| JONES:MANAGER:2975.0 |
| MARTIN:SALESMAN:1250.0 |
| BLAKE:MANAGER:2850.0 |
| CLARK:MANAGER:2450.0 |
| SCOTT:ANALYST:3000.0 |
| KING:PRESIDENT:5000.0 |
| TURNER:SALESMAN:1500.0 |
| ADAMS:CLERK:1100.0 |
| JAMES:CLERK:950.0 |
| FORD:ANALYST:3000.0 |
| MILLER:CLERK:1300.0 |
| HUGUANYU:HANGGE:18000.0 |
+--------------------------+
1.1.3?COLLECT_SET(col) 將內容收集成set集合
desc function collect_set ;
OK
+----------------------------------------------------+
| tab_name |
+----------------------------------------------------+
| collect_set(x) - Returns a set of objects with duplicate elements eliminated |
+----------------------------------------------------+
對表中的某個字段列操作
select deptno from tb_emp ;
OK
+---------+
| deptno |
+---------+
| 20 |
| 30 |
| 30 |
| 20 |
| 30 |
| 30 |
| 10 |
| 20 |
| 10 |
| 30 |
| 20 |
| 30 |
| 20 |
| 10 |
| 50 |
+---------+
select conllect_set(deptno) from tb_emp ;--->去重重復元素的數組
+----------------+
| _c0 |
+----------------+
| [20,30,10,50] |
+----------------+
collect_list(col)? 不會去重數據
select collect_list(deptno) as deptno_list from tb_emp ;
+-------------------------------------------------+
| deptno_list |
+-------------------------------------------------+
| [20,30,30,20,30,30,10,20,10,30,20,30,20,10,50] |
+-------------------------------------------------+
1.2 行轉列
結果如下:
射手座,A ???????????娜娜|鳳姐
白羊座,A ???????????孫悟空|豬八戒
白羊座,B ???????????宋宋
數據
孫悟空? ?白羊座 A
娜娜 ?????射手座 A
宋宋 ?????白羊座 B
豬八戒????白羊座 A
鳳姐 ?????射手座 A
1.2.1 建表導入數據
create table if not exists tb_star(
name string ,
star string ,
dname string
)
row format delimited fields terminated by "\t" ;
load data local inpath "/hive/data/star.txt" into table tb_star ;
0: jdbc:hive2://linux01:10000> select * from tb_star ;
OK
+---------------+---------------+----------------+
| tb_star.name | tb_star.star | tb_star.dname |
+---------------+---------------+----------------+
| 孫悟空 | 白羊座 | A |
| 娜娜 | 射手座 | A |
| 宋宋 | 白羊座 | B |
| 豬八戒 | 白羊座 | A |
| 鳳姐 | 射手座 | A |
+---------------+---------------+----------------+
1.2.2 代碼實現方式一
1) 先將星座和部門拼接 concat_ws
select
concat_ws("," , star , dname) as star_and_dname ,
name
from
tb_star ;
+-----------------+-------+
| star_and_dname | name |
+-----------------+-------+
| 白羊座,A | 孫悟空 |
| 射手座,A | 娜娜 |
| 白羊座,B | 宋宋 |
| 白羊座,A | 豬八戒 |
| 射手座,A | 鳳姐 |
+-----------------+-------+
2) 分組 收集姓名
with t1 as
(select
concat_ws("," , star , dname) as star_and_dname ,
name
from
tb_star)
select
star_and_dname,
collect_set(name)
from
t1
group by star_and_dname ;
+-----------------+----------------+
| star_and_dname | _c1 |
+-----------------+----------------+
| 射手座,A | ["娜娜","鳳姐"] |
| 白羊座,A | ["孫悟空","豬八戒"] |
| 白羊座,B | ["宋宋"] |
+-----------------+----------------+
3) 獲取結果
select
star_and_dname ,
concat_ws("|" ,name_arr )
from
(select
star_and_dname,
collect_set(name) as name_arr
from
(select
concat_ws("," , star , dname) as star_and_dname ,
name
from
tb_star) t
group by star_and_dname) t2 ;
+-----------------+----------+
| star_and_dname | _c1 |
+-----------------+----------+
| 射手座,A | 娜娜|鳳姐 |
| 白羊座,A | 孫悟空|豬八戒 |
| 白羊座,B | 宋宋 |
+-----------------+----------+
1.2.3 代碼實現方式二
select
concat(star , "," , dname) ,
concat_ws("|" , collect_set(name))
from
tb_star
group by star , dname
;
+--------+----------+
| _c0 | _c1 |
+--------+----------+
| 射手座,A | 娜娜|鳳姐 |
| 白羊座,A | 孫悟空|豬八戒 |
| 白羊座,B | 宋宋 |
+--------+----------+
2 列轉行
2.1 關鍵函數
split(str , 分隔符)? 返回一個數組
0: jdbc:hive2://linux01:10000> select split("hello,jim,yonggge,tom,cat" , ",") ;
OK
+----------------------------------------+
| _c0 |
+----------------------------------------+
| ["hello","jim","yonggge","tom","cat"] |
+----------------------------------------+
explode()? 炸裂函數? 將數組中的每個元素顯示在每行中
0: jdbc:hive2://linux01:10000> select explode (split("hello,jim,yonggge,tom,cat" , ",")) ;
OK
+----------+
| col |
+----------+
| hello |
| jim |
| yonggge |
| tom |
| cat |
+----------+
lateral view? 側窗口函數
2.2 示例
2.2.1 數據和需求
數據
《疑犯追蹤》?? ?懸疑,動作,科幻,劇情
《Lie to me》?? ?懸疑,警匪,動作,心理,劇情
《戰狼2》?? ?戰爭,動作,災難
需求 獲取如下結構
《疑犯追蹤》 ? ? ?懸疑
《疑犯追蹤》 ? ? ?動作
《疑犯追蹤》 ? ? ?科幻
《疑犯追蹤》 ? ? ?劇情
《Lie to me》 ? 懸疑
《Lie to me》 ? 警匪
《Lie to me》 ? 動作
《Lie to me》 ? 心理
《Lie to me》 ? 劇情
《戰狼2》 ? ? ? ?戰爭
《戰狼2》 ? ? ? ?動作
《戰狼2》 ? ? ? ?災難
2.2.2 建表導入數據
create table tb_movie(
name string ,
types string
)
row format delimited fields terminated by "\t" ;
load data local inpath "/hive/data/movie.txt" into table tb_movie ;
+----------------+-----------------+
| tb_movie.name | tb_movie.types |
+----------------+-----------------+
| 《疑犯追蹤》 | 懸疑,動作,科幻,劇情 |
| 《Lie to me》 | 懸疑,警匪,動作,心理,劇情 |
| 《戰狼2》 | 戰爭,動作,災難 |
+----------------+-----------------+
2.2.3 實現
1 將類型轉換成數組
select
split(types , ",") types_arr
from
tb_movie ;
2 炸裂
select
explode(split(types , ",")) f_type
from
tb_movie ;
+---------+
| f_type |
+---------+
| 懸疑 |
| 動作 |
| 科幻 |
| 劇情 |
| 懸疑 |
| 警匪 |
| 動作 |
| 心理 |
| 劇情 |
| 戰爭 |
| 動作 |
| 災難 |
+---------+
3 使用側窗口函數拼接
select
name ,
f_type
from
tb_movie
lateral view
explode(split(types ,",")) t as f_type ;
+--------------+---------+
| name | f_type |
+--------------+---------+
| 《疑犯追蹤》 | 懸疑 |
| 《疑犯追蹤》 | 動作 |
| 《疑犯追蹤》 | 科幻 |
| 《疑犯追蹤》 | 劇情 |
| 《Lie to me》 | 懸疑 |
| 《Lie to me》 | 警匪 |
| 《Lie to me》 | 動作 |
| 《Lie to me》 | 心理 |
| 《Lie to me》 | 劇情 |
| 《戰狼2》 | 戰爭 |
| 《戰狼2》 | 動作 |
| 《戰狼2》 | 災難 |
+--------------+---------+
總結
以上是生活随笔為你收集整理的hive 如何将数组转成字符串_hive详细笔记(八)-Hive之列转行和行转列(附带讲解视频 )...的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 336计算机考研怎么做到啊,【图片】20
- 下一篇: python字符串数组切片性能_pyth