hive调优fetch模式抓取,join表测试空key自动转换random,mapreduce动态分区设置
生活随笔
收集整理的這篇文章主要介紹了
hive调优fetch模式抓取,join表测试空key自动转换random,mapreduce动态分区设置
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1.fetch配置
[cevent@hadoop207 ~]$ cd /opt/module/hive-1.2.1/[cevent@hadoop207 hive-1.2.1]$ ll總用量 524drwxrwxr-x. 3 cevent cevent 4096 4月 30 15:59 bindrwxrwxr-x. 2 cevent cevent 4096 5月 9 18:40 conf-rw-rw-r--. 1 cevent cevent 20403 6月 5 13:48 derby.logdrwxrwxr-x. 4 cevent cevent 4096 4月 30 15:59 examplesdrwxrwxr-x. 7 cevent cevent 4096 4月 30 15:59 hcatalog-rw-rw-r--. 1 cevent cevent 23 5月 9 13:37 hive01.sqldrwxrwxr-x. 4 cevent cevent 4096 5月 7 13:51 lib-rw-rw-r--. 1 cevent cevent 24754 4月 30 2015 LICENSEdrwxrwxr-x. 2 cevent cevent 4096 6月 7 13:22 logsdrwxrwxr-x. 5 cevent cevent 4096 4月 30 16:44 metastore_db-rw-rw-r--. 1 cevent cevent 397 6月 19 2015 NOTICE-rw-rw-r--. 1 cevent cevent 4366 6月 19 2015 README.txt-rw-rw-r--. 1 cevent cevent 421129 6月 19 2015 RELEASE_NOTES.txt-rw-rw-r--. 1 cevent cevent 11 5月 9 13:27 result.txtdrwxrwxr-x. 3 cevent cevent 4096 4月 30 15:59 scripts-rw-rw-r--. 1 cevent cevent 171 5月 9 13:24 server.log-rw-rw-r--. 1 cevent cevent 5 5月 8 14:05 server.pid-rw-rw-r--. 1 cevent cevent 326 5月 22 13:34 student.txt[cevent@hadoop207 hive-1.2.1]$ bin/beeline Beeline version 1.2.1 by Apache Hivebeeline> !connectjdbc:hive2://hadoop207.cevent.com:10000Connecting to jdbc:hive2://hadoop207.cevent.com:10000Enter username forjdbc:hive2://hadoop207.cevent.com:10000: ceventEnter password forjdbc:hive2://hadoop207.cevent.com:10000: ******Connected to: Apache Hive (version 1.2.1)Driver: Hive JDBC (version 1.2.1)Transaction isolation:TRANSACTION_REPEATABLE_READ0: jdbc:hive2://hadoop207.cevent.com:10000>show databases;+----------------+--+| database_name |+----------------+--+| cevent01 || default |+----------------+--+2 rows selected (1.695 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> usecevent01;No rows affected (0.073 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> sethive.fetch.task.conversion=none; 將fetch轉換為none,開啟mapreduce模式No rows affected (0.03 seconds)0: jdbc:hive2://hadoop207.cevent.com:10000>select * from c_emp;INFO : Number of reduce tasks is set to 0 since there's no reduce operatorINFO : number of splits:1INFO : Submitting tokens for job: job_1591539871344_0001INFO : The url to track the job: http://hadoop207.cevent.com:8088/proxy/application_1591539871344_0001/INFO : Starting Job = job_1591539871344_0001, Tracking URL =http://hadoop207.cevent.com:8088/proxy/application_1591539871344_0001/INFO : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1591539871344_0001INFO : Hadoop job information for Stage-1: number of mappers: 1; number ofreducers: 0INFO : 2020-06-07 22:30:41,881 Stage-1 map = 0%, reduce = 0%INFO : 2020-06-07 22:30:53,412 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.11 secINFO : MapReduce Total cumulative CPU time: 1 seconds 110 msecINFO : Ended Job = job_1591539871344_0001+--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+| c_emp.empno |c_emp.ename | c_emp.jobs | c_emp.manager | c_emp.hiredata | c_emp.salary | c_emp.comment | c_emp.depno |+--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+| NULL | cevent | NULL | NULL | NULL | NULL | NULL | 619 || 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.0 | NULL | 20 || 7499 | ALLEN | SALESMAN | 7698 | 1981-2-20 | 1600.0 | 300.0 | 30 || 7521 | WARD | SALESMAN | 7698 | 1981-2-22 | 1250.0 | 500.0 | 30 || 7566 | JONES | MANAGER | 7839 | 1981-4-2 | 2975.0 | NULL | 20 || 7654 | MARTIN | SALESMAN | 7698 | 1981-9-28 | 1250.0 | 1400.0 | 30 || 7698 | BLAKE | MANAGER | 7839 | 1981-5-1 | 2850.0 | NULL | 30 || 7782 | CLARK | MANAGER | 7839 | 1981-6-9 | 2450.0 | NULL | 10 || 7788 | SCOTT | ANALYST | 7566 | 1987-4-19 | 3000.0 | NULL | 20 || 7839 | KING | PRESIDENT | NULL | 1981-11-17 | 5000.0 | NULL | 10 || 7844 | TURNER | SALESMAN | 7698 | 1981-9-8 | 1500.0 | 0.0 | 30 || 7876 | ADAMS | CLERK | 7788 | 1987-5-23 | 1100.0 | NULL | 20 || 7900 | JAMES | CLERK | 7698 | 1981-12-3 | 950.0 | NULL | 30 || 7902 | FORD | ANALYST | 7566 | 1981-12-3 | 3000.0 | NULL | 20 || 7934 | MILLER | CLERK | 7782 | 1982-1-23 | 1300.0 | NULL | 10 |+--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+15 rows selected (30.546 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> sethive.fetch.task.conversion=more;將fetch模式設置為more不開啟mapreduce,效率更高No rows affected (0.006 seconds)0: jdbc:hive2://hadoop207.cevent.com:10000>select * from c_emp;+--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+| c_emp.empno |c_emp.ename | c_emp.jobs | c_emp.manager | c_emp.hiredata | c_emp.salary | c_emp.comment | c_emp.depno |+--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+| NULL | cevent | NULL | NULL | NULL | NULL | NULL | 619 || 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.0 | NULL | 20 || 7499 | ALLEN | SALESMAN | 7698 | 1981-2-20 | 1600.0 | 300.0 | 30 || 7521 | WARD | SALESMAN | 7698 | 1981-2-22 | 1250.0 | 500.0 | 30 || 7566 | JONES | MANAGER | 7839 | 1981-4-2 | 2975.0 | NULL |20 || 7654 | MARTIN | SALESMAN | 7698 | 1981-9-28 | 1250.0 | 1400.0 | 30 || 7698 | BLAKE | MANAGER | 7839 | 1981-5-1 | 2850.0 | NULL | 30 || 7782 | CLARK | MANAGER | 7839 | 1981-6-9 | 2450.0 | NULL | 10 || 7788 | SCOTT | ANALYST | 7566 | 1987-4-19 | 3000.0 | NULL | 20 || 7839 | KING | PRESIDENT | NULL | 1981-11-17 | 5000.0 | NULL | 10 || 7844 | TURNER | SALESMAN | 7698 | 1981-9-8 | 1500.0 | 0.0 | 30 || 7876 | ADAMS | CLERK | 7788 | 1987-5-23 | 1100.0 | NULL | 20 || 7900 | JAMES | CLERK | 7698 | 1981-12-3 | 950.0 | NULL | 30 || 7902 | FORD | ANALYST | 7566 | 1981-12-3 | 3000.0 | NULL | 20 || 7934 | MILLER | CLERK | 7782 | 1982-1-23 | 1300.0 | NULL | 10 |+--------------+--------------+-------------+----------------+-----------------+---------------+----------------+--------------+--+15 rows selected (0.513 seconds)0: jdbc:hive2://hadoop207.cevent.com:10000>sethive.exec.mode.local.auto=true; 開啟本地模式,執行mapreduce,但效率較fetch none更高,謹慎使用本地模式No rows affected (0.005 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> select count(*) from c_emp;INFO : Number of reduce tasks determined at compile time: 1INFO : In order to change the average load for a reducer (in bytes):INFO : sethive.exec.reducers.bytes.per.reducer=<number>INFO : In order to limit the maximum number of reducers:INFO : sethive.exec.reducers.max=<number>INFO : In order to set a constant number of reducers:INFO : setmapreduce.job.reduces=<number>INFO : number of splits:1INFO : Submitting tokens for job: job_local1010738823_0001INFO : The url to track the job: http://localhost:8080/INFO : Job running in-process (local Hadoop)INFO : 2020-06-07 22:33:37,811 Stage-1 map = 100%, reduce = 100%INFO : Ended Job = job_local1010738823_0001+------+--+| _c0 |+------+--+| 15 |+------+--+1 row selected (1.613 seconds)2.元數據準備
3.修改mapreduce默認hive開啟的數量
[cevent@hadoop207 module]$ cd hive-1.2.1/[cevent@hadoop207 hive-1.2.1]$ cd conf/[cevent@hadoop207 conf]$ ll總用量 192-rw-rw-r--. 1 cevent cevent 1139 4月 30 2015beeline-log4j.properties.template-rw-rw-r--. 1 cevent cevent 168431 6月 19 2015 hive-default.xml.template-rw-rw-r--. 1 cevent cevent 2464 4月 30 16:06 hive-env.sh-rw-rw-r--. 1 cevent cevent 2662 4月 30 2015hive-exec-log4j.properties.template-rw-rw-r--. 1 cevent cevent 3093 5月 9 18:12hive-log4j.properties-rw-rw-r--. 1 cevent cevent 1354 5月 9 18:40 hive-site.xml-rw-rw-r--. 1 cevent cevent 1593 4月 30 2015 ivysettings.xml[cevent@hadoop207 conf]$ cat hive-site.xml 查看hive配置<?xml version="1.0"?><?xml-stylesheettype="text/xsl" href="configuration.xsl"?><configuration><!--mySQL數據庫地址--><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://hadoop207.cevent.com:3306/metastore?createDatabaseIfNotExist=true</value><description>JDBC connect string for a JDBCmetastore</description></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>Driver class name for a JDBCmetastore</description></property><!--mySQL數據庫訪問用戶名及密碼--><property><name>javax.jdo.option.ConnectionUserName</name><value>root</value><description>username to use against metastoredatabase</description></property><property><name>javax.jdo.option.ConnectionPassword</name><value>cevent</value><description>password to useagainst metastore database</description></property><!-- 自定義hive查詢顯示的信息 --><property><name>hive.cli.print.header</name><value>true</value></property><property><property><name>hive.cli.print.current.db</name><value>true</value><!--修改默認的reduce生成數量 ,去掉--></property><name>mapreduce.job.reduce</name><value>4</value></property></configuration>[cevent@hadoop207 conf]$ vim hive-site.xml 修改配置</property><value>4</value><name>mapreduce.job.reduce</name></property><?xml version="1.0"?><?xml-stylesheettype="text/xsl" href="configuration.xsl"?><configuration><!--mySQL數據庫地址--><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://hadoop207.cevent.com:3306/metastore?createDatabaseIfNotExist=true</value><description>JDBC connect string for a JDBCmetastore</description></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>Driver class name for a JDBCmetastore</description></property><!--mySQL數據庫訪問用戶名及密碼--><property><name>javax.jdo.option.ConnectionUserName</name><value>root</value><description>username to use against metastore database</description></property><property><name>javax.jdo.option.ConnectionPassword</name><value>cevent</value><description>password to use against metastoredatabase</description></property><!-- 自定義hive查詢顯示的信息 --><property><name>hive.cli.print.header</name><value>true</value></property><property><property><name>hive.cli.print.current.db</name><value>true</value><!--修改默認的reduce生成數量--></configuration>4.插入數據
0: jdbc:hive2://hadoop207.cevent.com:10000>create table big_join(0:jdbc:hive2://hadoop207.cevent.com:10000> idbigint,0:jdbc:hive2://hadoop207.cevent.com:10000> timebigint,0: jdbc:hive2://hadoop207.cevent.com:10000> uid string,0:jdbc:hive2://hadoop207.cevent.com:10000> keywordstring,0:jdbc:hive2://hadoop207.cevent.com:10000> url_rankint,0:jdbc:hive2://hadoop207.cevent.com:10000> click_numint,0: jdbc:hive2://hadoop207.cevent.com:10000> click_url string0: jdbc:hive2://hadoop207.cevent.com:10000> )0:jdbc:hive2://hadoop207.cevent.com:10000> rowformat delimited fields terminated by '\t'; No rows affected (0.467 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> createtable small_join(0: jdbc:hive2://hadoop207.cevent.com:10000> id bigint,0:jdbc:hive2://hadoop207.cevent.com:10000> timebigint,0:jdbc:hive2://hadoop207.cevent.com:10000> uidstring,0:jdbc:hive2://hadoop207.cevent.com:10000> keywordstring,0: jdbc:hive2://hadoop207.cevent.com:10000>url_rank int,0: jdbc:hive2://hadoop207.cevent.com:10000>click_num int,0:jdbc:hive2://hadoop207.cevent.com:10000> click_urlstring0:jdbc:hive2://hadoop207.cevent.com:10000> )0:jdbc:hive2://hadoop207.cevent.com:10000> rowformat delimited fields terminated by '\t'; No rows affected (0.1 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> createtable join_table(0:jdbc:hive2://hadoop207.cevent.com:10000> idbigint,0:jdbc:hive2://hadoop207.cevent.com:10000> timebigint,0: jdbc:hive2://hadoop207.cevent.com:10000> uid string,0:jdbc:hive2://hadoop207.cevent.com:10000> keywordstring,0:jdbc:hive2://hadoop207.cevent.com:10000> url_rankint,0:jdbc:hive2://hadoop207.cevent.com:10000> click_numint,0: jdbc:hive2://hadoop207.cevent.com:10000>click_url string0: jdbc:hive2://hadoop207.cevent.com:10000>)0:jdbc:hive2://hadoop207.cevent.com:10000> rowformat delimited fields terminated by '\t'; No rows affected (0.1 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> load datalocal inpath '/opt/module/datas/bigtable' into table big_join;INFO : Loading data to table cevent01.big_join fromfile:/opt/module/datas/bigtableINFO : Table cevent01.big_join stats: [numFiles=1, totalSize=120734753]No rows affected (14.891 seconds)0: jdbc:hive2://hadoop207.cevent.com:10000> load data local inpath '/opt/module/datas/smalltable' intotable small_join;INFO : Loading data to table cevent01.small_join fromfile:/opt/module/datas/smalltableINFO : Table cevent01.small_join stats: [numFiles=1, totalSize=12018355]No rows affected (1.723 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> select *from big_join limit 5;+--------------+-----------------+-----------------------------------+-------------------+--------------------+---------------------+----------------------------------------------------------------+--+| big_join.id | big_join.time | big_join.uid |big_join.keyword |big_join.url_rank |big_join.click_num | big_join.click_url |+--------------+-----------------+-----------------------------------+-------------------+--------------------+---------------------+----------------------------------------------------------------+--+| 0 | 20111230000005 | 57375476989eea12893c0c3811607bcf | 奇藝高清 | 1 | 1 |http://www.qiyi.com/ || 0 |20111230000005 |66c5bb7774e31d0a22278249b26bc83a | 凡人修仙傳 | 3 | 1 |http://www.booksky.org/BookDetail.aspx?BookID=1050804&Level=1 || 0 |20111230000007 |b97920521c78de70ac38e3713f524b50 | 本本聯盟 | 1 | 1 | http://www.bblianmeng.com/ || 0 |20111230000008 |6961d0c97fe93701fc9c0d861d096cd9 | 華南師范大學圖書館 | 1 | 1 |http://lib.scnu.edu.cn/ || 0 | 20111230000008 | f2f5a21c764aebde1e8afcc2871e086f | 在線代理 | 2 | 1 | http://proxyie.cn/ |+--------------+-----------------+-----------------------------------+-------------------+--------------------+---------------------+----------------------------------------------------------------+--+5 rows selected (0.613 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> select *from small_join limit 5;+----------------+------------------+-----------------------------------+---------------------+----------------------+-----------------------+----------------------------------------------------------------+--+| small_join.id |small_join.time | small_join.uid | small_join.keyword | small_join.url_rank | small_join.click_num | small_join.click_url |+----------------+------------------+-----------------------------------+---------------------+----------------------+-----------------------+----------------------------------------------------------------+--+| 95048 |20111230000005 | 57375476989eea12893c0c3811607bcf | 奇藝高清 | 1 | 1 | http://www.qiyi.com/ || 95096 |20111230000005 |66c5bb7774e31d0a22278249b26bc83a | 凡人修仙傳 | 3 | 1 |http://www.booksky.org/BookDetail.aspx?BookID=1050804&Level=1 || 95144 |20111230000007 |b97920521c78de70ac38e3713f524b50 | 本本聯盟 | 1 | 1 | http://www.bblianmeng.com/ || 95192 | 20111230000008 | 6961d0c97fe93701fc9c0d861d096cd9 | 華南師范大學圖書館 | 1 | 1 |http://lib.scnu.edu.cn/ || 95192 |20111230000008 |f2f5a21c764aebde1e8afcc2871e086f | 在線代理 | 2 | 1 |http://proxyie.cn/ |+----------------+------------------+-----------------------------------+---------------------+----------------------+-----------------------+----------------------------------------------------------------+--+5 rows selected (0.118 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> sethive.auto.convert.join=false; //將mapjoin打開效率更高,trueNo rows affected (0.017 seconds)0: jdbc:hive2://hadoop207.cevent.com:10000>insert overwrite table join_table0:jdbc:hive2://hadoop207.cevent.com:10000> selectb.id,b.time,b.uid,b.keyword,b.url_rank,b.click_num,b.click_url0:jdbc:hive2://hadoop207.cevent.com:10000> fromsmall_join s0: jdbc:hive2://hadoop207.cevent.com:10000> join big_join b0:jdbc:hive2://hadoop207.cevent.com:10000> onb.id=s.id;INFO : Number of reduce tasks not specified. Estimated from input datasize: 1INFO : In order to change the average load for a reducer (in bytes):INFO : set hive.exec.reducers.bytes.per.reducer=<number>INFO : In order to limit the maximum number of reducers:INFO : sethive.exec.reducers.max=<number>INFO : In order to set a constant number of reducers:INFO : set mapreduce.job.reduces=<number>INFO : number of splits:2INFO : Submitting tokens for job: job_1591593753790_0001INFO : The url to track the job:http://hadoop207.cevent.com:8088/proxy/application_1591593753790_0001/INFO : Starting Job = job_1591593753790_0001, Tracking URL =http://hadoop207.cevent.com:8088/proxy/application_1591593753790_0001/INFO : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1591593753790_0001INFO : Hadoop job information for Stage-1: number of mappers: 2; number ofreducers: 1INFO : 2020-06-08 13:41:29,289 Stage-1 map = 0%, reduce = 0%INFO : 2020-06-08 13:42:12,682 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 13.84 secINFO : 2020-06-08 13:42:33,048 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 17.94 secINFO : 2020-06-08 13:42:34,442 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 23.18 secINFO : MapReduce Total cumulative CPU time: 23 seconds 180 msecINFO : Ended Job = job_1591593753790_0001INFO : Loading data to table cevent01.join_table from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_table/.hive-staging_hive_2020-06-08_13-41-10_494_1851985601125007593-1/-ext-10000INFO : Table cevent01.join_table stats: [numFiles=1, numRows=999990,totalSize=123817216, rawDataSize=122817226]No rows affected (95.171 seconds)5.SQL解析
//hive表優化//1.join//創建大表create table big_join(id bigint,time bigint,uid string,keyword string,url_rank int,click_num int,click_url string)row format delimited fields terminated by'\t';//創建小表create table small_join(id bigint,time bigint,uid string,keyword string,url_rank int,click_num int,click_url string)row format delimited fields terminated by'\t';//創建join后表的語句create table join_table(id bigint,time bigint,uid string,keyword string,url_rank int,click_num int,click_url string)row format delimited fields terminated by'\t';//加載數據load data local inpath'/opt/module/datas/bigtable' into table big_join;load data local inpath '/opt/module/datas/smalltable'into table small_join;//關閉hive.auto.convert.join=false;關閉map join//關閉本地模式,影響效率:set hive.exec.mode.local.auto=false;//執行查詢插入,小表插入大表insert overwrite table join_tableselect b.id,b.time,b.uid,b.keyword,b.url_rank,b.click_num,b.click_urlfrom small_join sjoin big_join bon b.id=s.id;//執行查詢插入,大表插入小表insert overwrite table join_tableselectb.id,b.time,b.uid,b.keyword,b.url_rank,b.click_num,b.click_urlfrom big_join bjoin small_join son s.id=b.id;6.準備原始數據ori
7.執行插入
[cevent@hadoop207 hive-1.2.1]$ bin/beeline Beeline version 1.2.1 by Apache Hivebeeline>!connect jdbc:hive2://hadoop207.cevent.com:10000Connecting to jdbc:hive2://hadoop207.cevent.com:10000Enter username forjdbc:hive2://hadoop207.cevent.com:10000: ceventEnter password for jdbc:hive2://hadoop207.cevent.com:10000:******Connected to: Apache Hive (version 1.2.1)Driver: Hive JDBC (version 1.2.1)Transaction isolation:TRANSACTION_REPEATABLE_READ0:jdbc:hive2://hadoop207.cevent.com:10000> showdatabases;+----------------+--+| database_name |+----------------+--+| cevent01 || default |+----------------+--+2 rows selected (1.77 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> usecevent01;No rows affected (0.053 seconds) 插入join表數據0: jdbc:hive2://hadoop207.cevent.com:10000> insert overwrite table join_table0:jdbc:hive2://hadoop207.cevent.com:10000> selectb.id,b.time,b.uid,b.keyword,b.url_rank,b.click_num,b.click_url0:jdbc:hive2://hadoop207.cevent.com:10000> fromsmall_join s0: jdbc:hive2://hadoop207.cevent.com:10000> left join big_join b0:jdbc:hive2://hadoop207.cevent.com:10000> onb.id=s.id;INFO : Stage-1 is selected bycondition resolver.INFO : Number of reduce tasksnot specified. Estimated from input data size: 1INFO : In order to change theaverage load for a reducer (in bytes):INFO : sethive.exec.reducers.bytes.per.reducer=<number>INFO : In order to limit themaximum number of reducers:INFO : set hive.exec.reducers.max=<number>INFO : In order to set aconstant number of reducers:INFO : set mapreduce.job.reduces=<number>INFO : number of splits:2INFO : Submitting tokens forjob: job_1591675698957_0001INFO : The url to track thejob: http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0001/INFO : Starting Job = job_1591675698957_0001,Tracking URL =http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0001/INFO : Kill Command =/opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1591675698957_0001INFO : Hadoop job informationfor Stage-1: number of mappers: 2; number of reducers: 1INFO : 2020-06-09 12:13:16,419Stage-1 map = 0%, reduce = 0%INFO : 2020-06-09 12:13:43,229Stage-1 map = 50%, reduce = 0%,Cumulative CPU 5.68 secINFO : 2020-06-09 12:13:57,087Stage-1 map = 100%, reduce = 0%, CumulativeCPU 16.96 secINFO : 2020-06-09 12:14:12,299Stage-1 map = 100%, reduce = 80%,Cumulative CPU 22.19 secINFO : 2020-06-09 12:14:15,774Stage-1 map = 100%, reduce = 98%,Cumulative CPU 25.13 secINFO : 2020-06-09 12:14:24,366Stage-1 map = 100%, reduce = 100%,Cumulative CPU 32.81 secINFO : MapReduce Totalcumulative CPU time: 32 seconds 810 msecINFO : Ended Job =job_1591675698957_0001INFO : Loading data to tablecevent01.join_table from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_table/.hive-staging_hive_2020-06-09_12-12-53_365_6693250539177613475-1/-ext-10000INFO : Tablecevent01.join_table stats: [numFiles=1, numRows=999990, totalSize=123817216,rawDataSize=122817226]No rows affected (96.166 seconds)0: jdbc:hive2://hadoop207.cevent.com:10000>create table join_original( 創建原始數據表0: jdbc:hive2://hadoop207.cevent.com:10000> id bigint,0:jdbc:hive2://hadoop207.cevent.com:10000> timebigint,0:jdbc:hive2://hadoop207.cevent.com:10000> uidstring,0: jdbc:hive2://hadoop207.cevent.com:10000> keyword string,0:jdbc:hive2://hadoop207.cevent.com:10000> url_rankint,0:jdbc:hive2://hadoop207.cevent.com:10000> click_numint,0:jdbc:hive2://hadoop207.cevent.com:10000> click_urlstring0: jdbc:hive2://hadoop207.cevent.com:10000> )0: jdbc:hive2://hadoop207.cevent.com:10000>row format delimited fields terminated by '\t'; No rows affected (0.608 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> createtable join_original_null_id( 創建空key表0: jdbc:hive2://hadoop207.cevent.com:10000>id bigint,0: jdbc:hive2://hadoop207.cevent.com:10000>time bigint,0:jdbc:hive2://hadoop207.cevent.com:10000> uidstring,0:jdbc:hive2://hadoop207.cevent.com:10000> keywordstring,0: jdbc:hive2://hadoop207.cevent.com:10000>url_rank int,0: jdbc:hive2://hadoop207.cevent.com:10000>click_num int,0:jdbc:hive2://hadoop207.cevent.com:10000> click_urlstring0:jdbc:hive2://hadoop207.cevent.com:10000> )0:jdbc:hive2://hadoop207.cevent.com:10000> rowformat delimited fields terminated by '\t'; No rows affected (0.135 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> createtable join_original_table( 創建join表0:jdbc:hive2://hadoop207.cevent.com:10000> idbigint,0:jdbc:hive2://hadoop207.cevent.com:10000> timebigint,0: jdbc:hive2://hadoop207.cevent.com:10000> uid string,0:jdbc:hive2://hadoop207.cevent.com:10000> keywordstring,0:jdbc:hive2://hadoop207.cevent.com:10000> url_rankint,0:jdbc:hive2://hadoop207.cevent.com:10000> click_numint,0: jdbc:hive2://hadoop207.cevent.com:10000>click_url string0: jdbc:hive2://hadoop207.cevent.com:10000>)0:jdbc:hive2://hadoop207.cevent.com:10000> rowformat delimited fields terminated by '\t'; No rows affected (0.122 seconds)0: jdbc:hive2://hadoop207.cevent.com:10000> load data local inpath '/opt/module/datas/ori' into tablejoin_original;INFO : Loading data to table cevent01.join_original fromfile:/opt/module/datas/oriINFO : Table cevent01.join_original stats: [numFiles=1,totalSize=121734744]No rows affected (6.715 seconds)0: jdbc:hive2://hadoop207.cevent.com:10000> load data local inpath '/opt/module/datas/nullid' intotable join_original_null_id;INFO : Loading data to table cevent01.join_original_null_id fromfile:/opt/module/datas/nullidINFO : Table cevent01.join_original_null_id stats: [numFiles=1, totalSize=118645854]No rows affected (13.959 seconds) 插入join表數據,全插0:jdbc:hive2://hadoop207.cevent.com:10000> insertoverwrite table join_original_table 0: jdbc:hive2://hadoop207.cevent.com:10000> select n.* from 0: jdbc:hive2://hadoop207.cevent.com:10000> join_original_null_id n left join join_original o onn.id=o.id;INFO : Stage-1 is selected bycondition resolver.INFO : Number of reduce tasksnot specified. Estimated from input data size: 1INFO : In order to change theaverage load for a reducer (in bytes):INFO : sethive.exec.reducers.bytes.per.reducer=<number>INFO : In order to limit themaximum number of reducers:INFO : set hive.exec.reducers.max=<number>INFO : In order to set aconstant number of reducers:INFO : set mapreduce.job.reduces=<number>INFO : number of splits:2INFO : Submitting tokens forjob: job_1591675698957_0002INFO : The url to track thejob: http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0002/INFO : Starting Job =job_1591675698957_0002, Tracking URL =http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0002/INFO : Kill Command =/opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1591675698957_0002INFO : Hadoop job informationfor Stage-1: number of mappers: 2; number of reducers: 1INFO : 2020-06-09 13:26:11,449Stage-1 map = 0%, reduce = 0%INFO : 2020-06-09 13:27:00,291Stage-1 map = 100%, reduce = 0%,Cumulative CPU 16.19 secINFO : 2020-06-09 13:27:21,791Stage-1 map = 100%, reduce = 69%,Cumulative CPU 20.71 secINFO : 2020-06-09 13:27:25,027Stage-1 map = 100%, reduce = 77%,Cumulative CPU 22.42 secINFO : 2020-06-09 13:27:27,828Stage-1 map = 100%, reduce = 90%,Cumulative CPU 22.42 secINFO : 2020-06-09 13:27:33,724Stage-1 map = 100%, reduce = 100%, CumulativeCPU 27.9 secINFO : MapReduce Totalcumulative CPU time: 27 seconds 900 msecINFO : Ended Job =job_1591675698957_0002INFO : Loading data to table cevent01.join_original_tablefrom hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_original_table/.hive-staging_hive_2020-06-09_13-26-01_145_7407125718177019072-1/-ext-10000INFO : Tablecevent01.join_original_table stats: [numFiles=1, numRows=1000000,totalSize=118645854, rawDataSize=117645854]No rows affected (97.176 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> insertoverwrite table join_original_table 不為空查詢0:jdbc:hive2://hadoop207.cevent.com:10000> selectn.* from0:jdbc:hive2://hadoop207.cevent.com:10000> (select *from join_original_null_id where id is not null) 0:jdbc:hive2://hadoop207.cevent.com:10000> n leftjoin join_original o on n.id=o.id;INFO : Stage-1 is selected bycondition resolver.INFO : Number of reduce tasksnot specified. Estimated from input data size: 1INFO : In order to change theaverage load for a reducer (in bytes):INFO : sethive.exec.reducers.bytes.per.reducer=<number>INFO : In order to limit themaximum number of reducers:INFO : set hive.exec.reducers.max=<number>INFO : In order to set a constantnumber of reducers:INFO : set mapreduce.job.reduces=<number>INFO : number of splits:2INFO : Submitting tokens forjob: job_1591675698957_0003INFO : The url to track thejob: http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0003/INFO : Starting Job = job_1591675698957_0003,Tracking URL = http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0003/INFO : Kill Command =/opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1591675698957_0003INFO : Hadoop job informationfor Stage-1: number of mappers: 2; number of reducers: 1INFO : 2020-06-09 13:29:56,277Stage-1 map = 0%, reduce = 0%INFO : 2020-06-09 13:30:25,427Stage-1 map = 100%, reduce = 0%,Cumulative CPU 12.49 secINFO : 2020-06-09 13:30:48,717Stage-1 map = 100%, reduce = 100%, CumulativeCPU 19.14 secINFO : MapReduce Totalcumulative CPU time: 19 seconds 140 msecINFO : Ended Job =job_1591675698957_0003INFO : Loading data to tablecevent01.join_original_table from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_original_table/.hive-staging_hive_2020-06-09_13-29-44_221_5080146984779335117-1/-ext-10000INFO : Tablecevent01.join_original_table stats: [numFiles=1, numRows=200001,totalSize=24409561, rawDataSize=24209560]No rows affected (67.163 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> setmapreduce.job.reduces=3; 設置reduce的數量No rows affected (0.226 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> insertoverwrite table join_original_table 只匹配有id的0: jdbc:hive2://hadoop207.cevent.com:10000> select n.* from0:jdbc:hive2://hadoop207.cevent.com:10000> join_original_null_idn left join join_original o on n.id=o.id;INFO : Stage-1 is selected bycondition resolver.INFO : Number of reduce tasksnot specified. Defaulting to jobconf value of: 3INFO : In order to change theaverage load for a reducer (in bytes):INFO : sethive.exec.reducers.bytes.per.reducer=<number>INFO : In order to limit themaximum number of reducers:INFO : set hive.exec.reducers.max=<number>INFO : In order to set aconstant number of reducers:INFO : set mapreduce.job.reduces=<number>INFO : number of splits:2INFO : Submitting tokens forjob: job_1591675698957_0004INFO : The url to track the job:http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0004/INFO : Starting Job =job_1591675698957_0004, Tracking URL =http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0004/INFO : Kill Command =/opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1591675698957_0004INFO : Hadoop job informationfor Stage-1: number of mappers: 2; number of reducers: 3INFO : 2020-06-09 13:37:54,461Stage-1 map = 0%, reduce = 0%INFO : 2020-06-09 13:38:20,067Stage-1 map = 50%, reduce = 0%,Cumulative CPU 15.17 secINFO : 2020-06-09 13:38:25,554 Stage-1map = 100%, reduce = 0%, CumulativeCPU 18.33 secINFO : 2020-06-09 13:38:46,507Stage-1 map = 100%, reduce = 50%,Cumulative CPU 22.28 secINFO : 2020-06-09 13:39:12,015Stage-1 map = 100%, reduce = 76%,Cumulative CPU 31.73 secINFO : 2020-06-09 13:39:16,761Stage-1 map = 100%, reduce = 100%,Cumulative CPU 43.24 secINFO : MapReduce Totalcumulative CPU time: 43 seconds 240 msecINFO : Ended Job = job_1591675698957_0004INFO : Loading data to tablecevent01.join_original_table from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_original_table/.hive-staging_hive_2020-06-09_13-37-42_754_4173942603345328108-1/-ext-10000INFO : Tablecevent01.join_original_table stats: [numFiles=3, numRows=1000000,totalSize=118645854, rawDataSize=117645854]No rows affected (110.936 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> insertoverwrite table join_original_table 設置空key為隨機數0: jdbc:hive2://hadoop207.cevent.com:10000>select n.* from0: jdbc:hive2://hadoop207.cevent.com:10000>join_original_null_id n full join join_original o on 0:jdbc:hive2://hadoop207.cevent.com:10000> case whenn.id is null0: jdbc:hive2://hadoop207.cevent.com:10000>then concat('hive',rand())0: jdbc:hive2://hadoop207.cevent.com:10000>else n.id end=o.id;INFO : Number of reduce tasksnot specified. Defaulting to jobconf value of: 3INFO : In order to change theaverage load for a reducer (in bytes):INFO : sethive.exec.reducers.bytes.per.reducer=<number>INFO : In order to limit themaximum number of reducers:INFO : set hive.exec.reducers.max=<number>INFO : In order to set aconstant number of reducers:INFO : set mapreduce.job.reduces=<number>INFO : number of splits:2INFO : Submitting tokens forjob: job_1591675698957_0005INFO : The url to track thejob: http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0005/INFO : Starting Job =job_1591675698957_0005, Tracking URL = http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0005/INFO : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoopjob -kill job_1591675698957_0005INFO : Hadoop job informationfor Stage-1: number of mappers: 2; number of reducers: 3INFO : 2020-06-09 13:43:57,701Stage-1 map = 0%, reduce = 0%INFO : 2020-06-09 13:44:18,174Stage-1 map = 33%, reduce = 0%,Cumulative CPU 12.1 secINFO : 2020-06-09 13:44:19,621Stage-1 map = 50%, reduce = 0%,Cumulative CPU 14.39 secINFO : 2020-06-09 13:44:26,173Stage-1 map = 84%, reduce = 0%,Cumulative CPU 24.9 secINFO : 2020-06-09 13:44:30,664Stage-1 map = 100%, reduce = 0%,Cumulative CPU 26.35 secINFO : 2020-06-09 13:44:45,291Stage-1 map = 100%, reduce = 28%,Cumulative CPU 28.45 secINFO : 2020-06-09 13:44:52,047Stage-1 map = 100%, reduce = 56%,Cumulative CPU 31.54 secINFO : 2020-06-09 13:45:03,169Stage-1 map = 100%, reduce = 76%,Cumulative CPU 37.31 secINFO : 2020-06-09 13:45:04,818Stage-1 map = 100%, reduce = 89%,Cumulative CPU 42.86 secINFO : 2020-06-09 13:45:12,375Stage-1 map = 100%, reduce = 99%,Cumulative CPU 53.09 secINFO : 2020-06-09 13:45:13,851Stage-1 map = 100%, reduce = 100%,Cumulative CPU 53.26 secINFO : MapReduce Totalcumulative CPU time: 53 seconds 260 msecINFO : Ended Job =job_1591675698957_0005INFO : Loading data to tablecevent01.join_original_table fromhdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_original_table/.hive-staging_hive_2020-06-09_13-43-42_102_6854456494005290452-1/-ext-10000INFO : Tablecevent01.join_original_table stats: [numFiles=3, numRows=1799999,totalSize=135445833, rawDataSize=133645834]No rows affected (97.97 seconds)8.鏈接測試
鏈接測試cluster:http://hadoop207.cevent.com:8088/cluster
9.動態分區
0: jdbc:hive2://hadoop207.cevent.com:10000>select b.id from big_join b 查詢id<=10的數據0:jdbc:hive2://hadoop207.cevent.com:10000> join(select id from join_original where id<=10) o0:jdbc:hive2://hadoop207.cevent.com:10000> onb.id=o.id;INFO : Stage-1 is selected by condition resolver.INFO : Number of reduce tasks not specified. Estimated from input datasize: 1INFO : In order to change the average load for a reducer (in bytes):INFO : sethive.exec.reducers.bytes.per.reducer=<number>INFO : In order to limit the maximum number of reducers:INFO : set hive.exec.reducers.max=<number>INFO : In order to set a constant number of reducers:INFO : set mapreduce.job.reduces=<number>INFO : number of splits:2INFO : Submitting tokens for job: job_1591675698957_0014INFO : The url to track the job:http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0014/INFO : Starting Job = job_1591675698957_0014, Tracking URL =http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0014/INFO : Kill Command = /opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1591675698957_0014INFO : Hadoop job information for Stage-1: number of mappers: 2; number ofreducers: 1INFO : 2020-06-09 17:00:28,898 Stage-1 map = 0%, reduce = 0%INFO : 2020-06-09 17:00:49,450 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 6.22 secINFO : 2020-06-09 17:00:51,358 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 10.76 secINFO : 2020-06-09 17:01:04,748 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.9 secINFO : MapReduce Total cumulative CPU time: 14 seconds 900 msecINFO : Ended Job = job_1591675698957_0014+-------+--+| b.id |+-------+--+| 1 || 1 || 1 || 1 || 1 || 1 || 1 || 1 || 1 || 1 || 2 || 2 || 2 || 2 || 2 || 2 || 2 || 2 || 2 || 2 || 3 || 3 || 3 || 3 || 3 || 3 || 3 || 3 || 3 || 3 || 4 || 4 || 4 || 4 || 4 || 4 || 4 || 4 || 4 || 4 || 5 || 5 || 5 || 5 || 5 || 5 || 5 || 5 || 5 || 5 || 6 || 6 || 6 || 6 || 6 || 6 || 6 || 6 || 6 || 6 || 7 || 7 || 7 || 7 || 7 || 7 || 7 || 7 || 7 || 7 || 8 || 8 || 8 || 8 || 8 || 8 || 8 || 8 || 8 || 8 || 9 || 9 || 9 || 9 || 9 || 9 || 9 || 9 || 9 || 9 || 10 || 10 || 10 || 10 || 10 || 10 || 10 || 10 || 10 || 10 |+-------+--+100 rows selected (48.722 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> sethive.exec.dynamic.partition=true; 開啟動態分區功能No rows affected (0.018 seconds)0: jdbc:hive2://hadoop207.cevent.com:10000>set hive.exec.dynamic.partition.mode=nonstrict; 設置為非嚴格模式No rows affected (0.587 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> sethive.exec.max.dynamic.partitions=1000; 所有maoreduce節點上最大創建的動態分區數量1000No rows affected (0.007 seconds) 在執行的每個mapreduce中,最大可以創建100個動態分區0:jdbc:hive2://hadoop207.cevent.com:10000> sethive.exec.max.dynamic.partitions.pernode=100;No rows affected (0.009 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> sethive.exec.max.created.files=100000; 最大可以創建多少個HDFS文件,默認為10萬No rows affected (0.006 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> set hive.error.on.empty.partition=false;空分區生成時,是否拋出異常No rows affected (0.007 seconds) 創建分區大表0: jdbc:hive2://hadoop207.cevent.com:10000>create table join_dep_partition(0:jdbc:hive2://hadoop207.cevent.com:10000> id int,0:jdbc:hive2://hadoop207.cevent.com:10000> namestring0:jdbc:hive2://hadoop207.cevent.com:10000> )0: jdbc:hive2://hadoop207.cevent.com:10000>partitioned by (location int) row format delimitedfields terminated by '\t';No rows affected (2.525 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> insertinto table join_dep_partition 插入查詢結果,自動根據類型匹配0:jdbc:hive2://hadoop207.cevent.com:10000> partition(location)0: jdbc:hive2://hadoop207.cevent.com:10000>select deptno,dname,locations from c_dept;INFO : Number of reduce tasksis set to 0 since there's no reduce operatorINFO : number of splits:1INFO : Submitting tokens forjob: job_1591675698957_0015INFO : The url to track thejob: http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0015/INFO : Starting Job =job_1591675698957_0015, Tracking URL =http://hadoop207.cevent.com:8088/proxy/application_1591675698957_0015/INFO : Kill Command =/opt/module/hadoop-2.7.2/bin/hadoop job -kill job_1591675698957_0015INFO : Hadoop job informationfor Stage-1: number of mappers: 1; number of reducers: 0INFO : 2020-06-09 18:18:43,415Stage-1 map = 0%, reduce = 0%INFO : 2020-06-09 18:18:53,201Stage-1 map = 100%, reduce = 0%,Cumulative CPU 1.95 secINFO : MapReduce Totalcumulative CPU time: 1 seconds 950 msecINFO : Ended Job =job_1591675698957_0015INFO : Stage-4 is selected bycondition resolver.INFO : Stage-3 is filtered outby condition resolver.INFO : Stage-5 is filtered outby condition resolver.INFO : Moving data to:hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_dep_partition/.hive-staging_hive_2020-06-09_18-18-27_297_470948791343936680-7/-ext-10000from hdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_dep_partition/.hive-staging_hive_2020-06-09_18-18-27_297_470948791343936680-7/-ext-10002INFO : Loading data to tablecevent01.join_dep_partition partition (location=null) fromhdfs://hadoop207.cevent.com:8020/user/hive/warehouse/cevent01.db/join_dep_partition/.hive-staging_hive_2020-06-09_18-18-27_297_470948791343936680-7/-ext-10000INFO : Time taken for load dynamicpartitions : 2375INFO : Loading partition {location=1800}INFO : Loading partition {location=1900}INFO : Loading partition {location=1700}INFO : Time taken for adding to writeentity : 2INFO : Partitioncevent01.join_dep_partition{location=1700} stats: [numFiles=1, numRows=2,totalSize=28, rawDataSize=26]INFO : Partitioncevent01.join_dep_partition{location=1800} stats: [numFiles=1, numRows=1,totalSize=12, rawDataSize=11]INFO : Partitioncevent01.join_dep_partition{location=1900} stats: [numFiles=1, numRows=1,totalSize=9, rawDataSize=8]No rows affected (32.8 seconds)0:jdbc:hive2://hadoop207.cevent.com:10000> showpartitions join_dep_partition; 查看分區+----------------+--+| partition |+----------------+--+| location=1700 || location=1800 || location=1900 |+----------------+--+3 rows selected (0.502 seconds)10.鏈接校驗
http://hadoop207.cevent.com:50070/explorer.html#/user/hive/warehouse/cevent01.db
總結
以上是生活随笔為你收集整理的hive调优fetch模式抓取,join表测试空key自动转换random,mapreduce动态分区设置的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Wonderware配置-Histori
- 下一篇: 原子锁操作(全部)