ETL数据处理后的业务分析(一)
接https://georgedage.blog.csdn.net/article/details/102996488
ETL后的數據處理!!!
建表:
影音信息數據:
create table gulivideo_ori( videoId string, uploader string, age int, category array<string>, length int, views int, rate float, ratings int, comments int, relatedId array<string>) row format delimited fields terminated by "\t" collection items terminated by "&" stored as textfile?用戶信息數據:
create table gulivideo_user_ori( uploader string, videos int, friends int) row format delimited fields terminated by "\t" stored as textfile;show tables:
+---------------------+--+ | tab_name | +---------------------+--+ | gulivideo_ori | | gulivideo_user_ori | +---------------------+--+導入數據:
load data inpath "/guiliVideo/output/video/2008/0222" into table gulivideo_ori;?
load data inpath "/guiliVideo/user/2008/0903" into table gulivideo_user_ori;驗證一下:
業務分析:
先來個圖示:
#1、統計視頻觀看數 Top10
select videoId,views from gulivideo_ori order by views desc limit 10;
結果展示:
#2、統計視頻類別熱度 Top10
即統計每個類別有多少個視頻,顯示出包含視頻最多的前 10 個類別。
所以根據之前https://georgedage.blog.csdn.net/article/details/102905208的經驗
需要列轉行
select videoId,category_name from gulivideo_ori lateral view explode(category) table_tmp as category_name limit 10;
列轉行結果:?
然后根據這條語句在進行操作
select ?category_name as category,count(t.videoId) as hot?
from (select videoId,category_name from gulivideo_ori lateral view explode(category) table_tmp as category_name) t?
group by t.category_name?
order by hot desc?
limit 10;
結果展示:?
出現了問題,目前沒有解決,后續再更!!!
beeline下提交任務,報出Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
總結
以上是生活随笔為你收集整理的ETL数据处理后的业务分析(一)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Spark读取文本文件并转换为DataF
- 下一篇: 常见的几个大数据名词:OLAP、OLTP