Apache Hive on Apache Tez
生活随笔
收集整理的這篇文章主要介紹了
Apache Hive on Apache Tez
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
Tez是Apache最新開源的支持DAG作業的計算框架,它直接源于MapReduce框架,核心思想是將Map和Reduce兩個操作進一步拆分,即Map被拆分成Input、Processor、Sort、Merge和Output, Reduce被拆分成Input、Shuffle、Sort、Merge、Processor和Output等,這樣,這些分解后的元操作可以任意靈活組合,產生新的操作,這些操作經過一些控制程序組裝后,可形成一個大的DAG作業。總結起來,Tez有以下特點:
(1)Apache二級開源項目(源代碼今天發布的)
(2)運行在YARN之上
(3) 適用于DAG(有向圖)應用(同Impala、Dremel和Drill一樣,可用于替換Hive/Pig等)
其中,第三點需要做一些簡單的說明,Apache當前有頂級項目Oozie用于DAG作業設計,但Oozie是比較高層(作業層面)的,它只是提供了一種多類型作業(比如MR程序、Hive、Pig等)依賴關系表達方式,并按照這種依賴關系提交這些作業,而Tez則不同,它在更底層提供了DAG編程接口,用戶編寫程序時直接采用這些接口進行程序設計,這種更底層的編程方式會帶來更高的效率.
詳細介紹:http://dongxicheng.org/mapreduce-nextgen/apache-tez-newest-progress/
Tez有以下幾個特色:
(1) 豐富的數據流(dataflow,NOT Streaming!)編程接口;
(2) 擴展性良好的“Input-Processor-Output”運行模型;
(3) 簡化數據部署(充分利用了YARN框架,Tez本身僅是一個客戶端編程庫,無需事先部署相關服務)
(4) 性能優于MapReduce
(5) ?優化的資源管理(直接運行在資源管理系統YARN之上)
hive> select a.buildingid, b.buildingmgr, max(a.targettemp-a.actualtemp)> from hvac a join building b> on a.buildingid = b.buildingid> group by a.buildingid, b.buildingmgr limit 10; Query ID = root_20141030001616_e16c39f3-8cc6-4809-96f1-c76a55543a72 Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established.Status: Running (application id: application_1414630785017_0014)Map 1: -/- Map 2: -/- Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 1/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 1/1 Map 2: 1/1 Reducer 3: 0/1 Map 1: 1/1 Map 2: 1/1 Reducer 3: 1/1 Status: Finished successfully OK 1 M1 14 2 M2 15 3 M3 15 4 M4 15 5 M5 15 6 M6 15 7 M7 14 8 M8 15 9 M9 14 10 M10 15 Time taken: 37.644 seconds, Fetched: 10 row(s) hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid limit 10; Query ID = root_20141030001919_7ef87e96-9848-404b-9f7c-2e847efbda72 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0014)Map 1: -/- Map 2: -/- Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 1/1 Map 2: 0/1 Map 1: 1/1 Map 2: 1/1 Status: Finished successfully OK 6/1/13 0:00:01 66 58 13 20 4 Finland GG1919 17 M4 6/2/13 1:00:01 69 68 3 20 17 Egypt FN39TG 11 M17 6/3/13 2:00:01 70 73 17 20 18 Indonesia JDNS77 25 M18 6/4/13 3:00:01 67 63 2 23 15 Israel ACMAX22 19 M15 6/5/13 4:00:01 68 74 16 9 3 Brazil JDNS77 28 M3 6/6/13 5:00:01 67 56 13 28 4 Finland GG1919 17 M4 6/7/13 6:00:01 70 58 12 24 2 France FN39TG 27 M2 6/8/13 7:00:01 70 73 20 26 16 Turkey AC1000 23 M16 6/9/13 8:00:01 66 69 16 9 9 Mexico GG1919 11 M9 6/10/13 9:00:01 65 57 6 5 12 Finland FN39TG 26 M12 Time taken: 30.373 seconds, Fetched: 10 row(s)
hive> set hive.execution.engine=mr; hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid limit 10; Query ID = root_20141030002121_82b6c25d-3efe-4edb-97f0-6ee104a44793 Total jobs = 1 14/10/30 00:21:17 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/10/30 00:21:17 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. Execution log at: /tmp/root/root_20141030002121_82b6c25d-3efe-4edb-97f0-6ee104a44793.log 2014-10-30 12:21:20 Starting to launch local task to process map join; maximum memory = 260177920 2014-10-30 12:21:26 Dump the side-table into file: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile40--.hashtable 2014-10-30 12:21:26 Uploaded 1 File to: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile40--.hashtable (1044 bytes) 2014-10-30 12:21:26 End of local task; Time Taken: 6.094 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1414630785017_0015, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0015/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414630785017_0015 Hadoop job information for Stage-3: number of mappers: 0; number of reducers: 0 2014-10-30 00:21:40,956 Stage-3 map = 0%, reduce = 0% 2014-10-30 00:21:51,576 Stage-3 map = 100%, reduce = 0% Ended Job = job_1414630785017_0015 MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 6/1/13 0:00:01 66 58 13 20 4 Finland GG1919 17 M4 6/2/13 1:00:01 69 68 3 20 17 Egypt FN39TG 11 M17 6/3/13 2:00:01 70 73 17 20 18 Indonesia JDNS77 25 M18 6/4/13 3:00:01 67 63 2 23 15 Israel ACMAX22 19 M15 6/5/13 4:00:01 68 74 16 9 3 Brazil JDNS77 28 M3 6/6/13 5:00:01 67 56 13 28 4 Finland GG1919 17 M4 6/7/13 6:00:01 70 58 12 24 2 France FN39TG 27 M2 6/8/13 7:00:01 70 73 20 26 16 Turkey AC1000 23 M16 6/9/13 8:00:01 66 69 16 9 9 Mexico GG1919 11 M9 6/10/13 9:00:01 65 57 6 5 12 Finland FN39TG 26 M12 Time taken: 45.935 seconds, Fetched: 10 row(s) hive> select a.buildingid, b.buildingmgr, max(a.targettemp-a.actualtemp)> from hvac a join building b> on a.buildingid = b.buildingid> group by a.buildingid, b.buildingmgr limit 10; Query ID = root_20141030002323_041fbe30-9942-4a05-b307-425a8865fe30 Total jobs = 1 14/10/30 00:23:16 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10007/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/10/30 00:23:16 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10007/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. Execution log at: /tmp/root/root_20141030002323_041fbe30-9942-4a05-b307-425a8865fe30.log 2014-10-30 12:23:17 Starting to launch local task to process map join; maximum memory = 260177920 2014-10-30 12:23:19 Dump the side-table into file: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile51--.hashtable 2014-10-30 12:23:19 Uploaded 1 File to: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile51--.hashtable (714 bytes) 2014-10-30 12:23:19 End of local task; Time Taken: 1.254 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number> In order to set a constant number of reducers:set mapreduce.job.reduces=<number> Starting Job = job_1414630785017_0016, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0016/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414630785017_0016 Hadoop job information for Stage-2: number of mappers: 0; number of reducers: 0 2014-10-30 00:23:27,599 Stage-2 map = 0%, reduce = 0% 2014-10-30 00:23:37,062 Stage-2 map = 100%, reduce = 0% 2014-10-30 00:23:44,578 Stage-2 map = 100%, reduce = 100% Ended Job = job_1414630785017_0016 MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 1 M1 14 2 M2 15 3 M3 15 4 M4 15 5 M5 15 6 M6 15 7 M7 14 8 M8 15 9 M9 14 10 M10 15 Time taken: 32.685 seconds, Fetched: 10 row(s)
hive> select date, count(buildingid) from hvac_orc group by date; Query ID = root_20141030003232_da026977-69c2-4dd8-8648-56c73434b4d0 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0019)Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 6/1/13 267 6/10/13 267 6/11/13 267 6/12/13 267 6/13/13 267 6/14/13 267 6/15/13 267 6/16/13 267 6/17/13 267 6/18/13 267 6/19/13 267 6/2/13 267 6/20/13 267 6/21/13 266 6/22/13 266 6/23/13 266 6/24/13 266 6/25/13 266 6/26/13 266 6/27/13 266 6/28/13 266 6/29/13 266 6/3/13 267 6/30/13 266 6/4/13 267 6/5/13 267 6/6/13 267 6/7/13 267 6/8/13 267 6/9/13 267 Time taken: 6.904 seconds, Fetched: 30 row(s)
(1)Apache二級開源項目(源代碼今天發布的)
(2)運行在YARN之上
(3) 適用于DAG(有向圖)應用(同Impala、Dremel和Drill一樣,可用于替換Hive/Pig等)
其中,第三點需要做一些簡單的說明,Apache當前有頂級項目Oozie用于DAG作業設計,但Oozie是比較高層(作業層面)的,它只是提供了一種多類型作業(比如MR程序、Hive、Pig等)依賴關系表達方式,并按照這種依賴關系提交這些作業,而Tez則不同,它在更底層提供了DAG編程接口,用戶編寫程序時直接采用這些接口進行程序設計,這種更底層的編程方式會帶來更高的效率.
詳細介紹:http://dongxicheng.org/mapreduce-nextgen/apache-tez-newest-progress/
Tez有以下幾個特色:
(1) 豐富的數據流(dataflow,NOT Streaming!)編程接口;
(2) 擴展性良好的“Input-Processor-Output”運行模型;
(3) 簡化數據部署(充分利用了YARN框架,Tez本身僅是一個客戶端編程庫,無需事先部署相關服務)
(4) 性能優于MapReduce
(5) ?優化的資源管理(直接運行在資源管理系統YARN之上)
(6) 動態生成物理數據流(dataflow)
操作日志:
http://zh.hortonworks.com/hadoop-tutorial/supercharging-interactive-queries-hive-tez/
hive> select a.buildingid, b.buildingmgr, max(a.targettemp-a.actualtemp)> from hvac a join building b> on a.buildingid = b.buildingid> group by a.buildingid, b.buildingmgr limit 10; Query ID = root_20141030001616_e16c39f3-8cc6-4809-96f1-c76a55543a72 Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established.Status: Running (application id: application_1414630785017_0014)Map 1: -/- Map 2: -/- Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 1/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 1/1 Map 2: 1/1 Reducer 3: 0/1 Map 1: 1/1 Map 2: 1/1 Reducer 3: 1/1 Status: Finished successfully OK 1 M1 14 2 M2 15 3 M3 15 4 M4 15 5 M5 15 6 M6 15 7 M7 14 8 M8 15 9 M9 14 10 M10 15 Time taken: 37.644 seconds, Fetched: 10 row(s) hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid limit 10; Query ID = root_20141030001919_7ef87e96-9848-404b-9f7c-2e847efbda72 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0014)Map 1: -/- Map 2: -/- Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 1/1 Map 2: 0/1 Map 1: 1/1 Map 2: 1/1 Status: Finished successfully OK 6/1/13 0:00:01 66 58 13 20 4 Finland GG1919 17 M4 6/2/13 1:00:01 69 68 3 20 17 Egypt FN39TG 11 M17 6/3/13 2:00:01 70 73 17 20 18 Indonesia JDNS77 25 M18 6/4/13 3:00:01 67 63 2 23 15 Israel ACMAX22 19 M15 6/5/13 4:00:01 68 74 16 9 3 Brazil JDNS77 28 M3 6/6/13 5:00:01 67 56 13 28 4 Finland GG1919 17 M4 6/7/13 6:00:01 70 58 12 24 2 France FN39TG 27 M2 6/8/13 7:00:01 70 73 20 26 16 Turkey AC1000 23 M16 6/9/13 8:00:01 66 69 16 9 9 Mexico GG1919 11 M9 6/10/13 9:00:01 65 57 6 5 12 Finland FN39TG 26 M12 Time taken: 30.373 seconds, Fetched: 10 row(s)
hive> set hive.execution.engine=mr; hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid limit 10; Query ID = root_20141030002121_82b6c25d-3efe-4edb-97f0-6ee104a44793 Total jobs = 1 14/10/30 00:21:17 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/10/30 00:21:17 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. Execution log at: /tmp/root/root_20141030002121_82b6c25d-3efe-4edb-97f0-6ee104a44793.log 2014-10-30 12:21:20 Starting to launch local task to process map join; maximum memory = 260177920 2014-10-30 12:21:26 Dump the side-table into file: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile40--.hashtable 2014-10-30 12:21:26 Uploaded 1 File to: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile40--.hashtable (1044 bytes) 2014-10-30 12:21:26 End of local task; Time Taken: 6.094 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1414630785017_0015, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0015/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414630785017_0015 Hadoop job information for Stage-3: number of mappers: 0; number of reducers: 0 2014-10-30 00:21:40,956 Stage-3 map = 0%, reduce = 0% 2014-10-30 00:21:51,576 Stage-3 map = 100%, reduce = 0% Ended Job = job_1414630785017_0015 MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 6/1/13 0:00:01 66 58 13 20 4 Finland GG1919 17 M4 6/2/13 1:00:01 69 68 3 20 17 Egypt FN39TG 11 M17 6/3/13 2:00:01 70 73 17 20 18 Indonesia JDNS77 25 M18 6/4/13 3:00:01 67 63 2 23 15 Israel ACMAX22 19 M15 6/5/13 4:00:01 68 74 16 9 3 Brazil JDNS77 28 M3 6/6/13 5:00:01 67 56 13 28 4 Finland GG1919 17 M4 6/7/13 6:00:01 70 58 12 24 2 France FN39TG 27 M2 6/8/13 7:00:01 70 73 20 26 16 Turkey AC1000 23 M16 6/9/13 8:00:01 66 69 16 9 9 Mexico GG1919 11 M9 6/10/13 9:00:01 65 57 6 5 12 Finland FN39TG 26 M12 Time taken: 45.935 seconds, Fetched: 10 row(s) hive> select a.buildingid, b.buildingmgr, max(a.targettemp-a.actualtemp)> from hvac a join building b> on a.buildingid = b.buildingid> group by a.buildingid, b.buildingmgr limit 10; Query ID = root_20141030002323_041fbe30-9942-4a05-b307-425a8865fe30 Total jobs = 1 14/10/30 00:23:16 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10007/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/10/30 00:23:16 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10007/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. Execution log at: /tmp/root/root_20141030002323_041fbe30-9942-4a05-b307-425a8865fe30.log 2014-10-30 12:23:17 Starting to launch local task to process map join; maximum memory = 260177920 2014-10-30 12:23:19 Dump the side-table into file: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile51--.hashtable 2014-10-30 12:23:19 Uploaded 1 File to: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile51--.hashtable (714 bytes) 2014-10-30 12:23:19 End of local task; Time Taken: 1.254 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number> In order to set a constant number of reducers:set mapreduce.job.reduces=<number> Starting Job = job_1414630785017_0016, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0016/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414630785017_0016 Hadoop job information for Stage-2: number of mappers: 0; number of reducers: 0 2014-10-30 00:23:27,599 Stage-2 map = 0%, reduce = 0% 2014-10-30 00:23:37,062 Stage-2 map = 100%, reduce = 0% 2014-10-30 00:23:44,578 Stage-2 map = 100%, reduce = 100% Ended Job = job_1414630785017_0016 MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 1 M1 14 2 M2 15 3 M3 15 4 M4 15 5 M5 15 6 M6 15 7 M7 14 8 M8 15 9 M9 14 10 M10 15 Time taken: 32.685 seconds, Fetched: 10 row(s)
Query Vectorization
Step 1:
hive> create table hvac_orc stored as orc as select * from hvac; Query ID = root_20141030002727_5b56b417-f2d7-405c-bb2b-48a74971fa14 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1414630785017_0017, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0017/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414630785017_0017 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2014-10-30 00:27:56,573 Stage-1 map = 0%, reduce = 0% 2014-10-30 00:28:08,218 Stage-1 map = 100%, reduce = 0% Ended Job = job_1414630785017_0017 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to: hdfs://sandbox.hortonworks.com:8020/tmp/hive-root/hive_2014-10-30_00-27-32_394_2850138279224601231-1/-ext-10001 Moving data to: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/hvac_orc Table default.hvac_orc stats: [numFiles=1, numRows=8000, totalSize=26593, rawDataSize=1768000] MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 37.248 secondsStep 2:
Run the following statement to enable Tez.
hive> set hive.execution.engine=tez;Step 3:
hive> select date, count(buildingid) from hvac group by date; Query ID = root_20141030003131_d3ff96e0-a7d7-42b2-ab4a-bcfa6507a5dd Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0019)Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 6/1/13 267 6/10/13 267 6/11/13 267 6/12/13 267 6/13/13 267 6/14/13 267 6/15/13 267 6/16/13 267 6/17/13 267 6/18/13 267 6/19/13 267 6/2/13 267 6/20/13 267 6/21/13 266 6/22/13 266 6/23/13 266 6/24/13 266 6/25/13 266 6/26/13 266 6/27/13 266 6/28/13 266 6/29/13 266 6/3/13 267 6/30/13 266 6/4/13 267 6/5/13 267 6/6/13 267 6/7/13 267 6/8/13 267 6/9/13 267 Time taken: 25.633 seconds, Fetched: 30 row(s)Step 4:
hive> select date, count(buildingid) from hvac_orc group by date; Query ID = root_20141030003131_fcf22ec7-ff7f-4a01-8316-ad07929929d8 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0019)Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 6/1/13 267 6/10/13 267 6/11/13 267 6/12/13 267 6/13/13 267 6/14/13 267 6/15/13 267 6/16/13 267 6/17/13 267 6/18/13 267 6/19/13 267 6/2/13 267 6/20/13 267 6/21/13 266 6/22/13 266 6/23/13 266 6/24/13 266 6/25/13 266 6/26/13 266 6/27/13 266 6/28/13 266 6/29/13 266 6/3/13 267 6/30/13 266 6/4/13 267 6/5/13 267 6/6/13 267 6/7/13 267 6/8/13 267 6/9/13 267 Time taken: 10.683 seconds, Fetched: 30 row(s)Step 5:
Now let’s run the following steps to enable vectorization:
set hive.vectorized.execution.enabled; hive> set hive.vectorized.execution.enabled; hive.vectorized.execution.enabled=trueand then run the sql query from previous step
select date, count(buildingid) from hvac_orc group by date;hive> select date, count(buildingid) from hvac_orc group by date; Query ID = root_20141030003232_da026977-69c2-4dd8-8648-56c73434b4d0 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0019)Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 6/1/13 267 6/10/13 267 6/11/13 267 6/12/13 267 6/13/13 267 6/14/13 267 6/15/13 267 6/16/13 267 6/17/13 267 6/18/13 267 6/19/13 267 6/2/13 267 6/20/13 267 6/21/13 266 6/22/13 266 6/23/13 266 6/24/13 266 6/25/13 266 6/26/13 266 6/27/13 266 6/28/13 266 6/29/13 266 6/3/13 267 6/30/13 266 6/4/13 267 6/5/13 267 6/6/13 267 6/7/13 267 6/8/13 267 6/9/13 267 Time taken: 6.904 seconds, Fetched: 30 row(s)
Step 6:
Let’s look at the ‘explain’ plan to confirm that it is indeed using a vectorized query plan:
explain select date, count(buildingid) from hvac_orc group by date; hive> explain select date, count(buildingid) from hvac_orc group by date; OK STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 is a root stageSTAGE PLANS:Stage: Stage-1TezEdges:Reducer 2 <- Map 1 (SIMPLE_EDGE)DagName: root_20141030003333_eef5ffbf-8427-4d9b-b4be-4a8994926a77:8Vertices:Map 1 Map Operator Tree:TableScanalias: hvac_orcStatistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: date (type: string), buildingid (type: bigint)outputColumnNames: date, buildingidStatistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONEGroup By Operatoraggregations: count(buildingid)keys: date (type: string)mode: hashoutputColumnNames: _col0, _col1Statistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: string)sort order: +Map-reduce partition columns: _col0 (type: string)Statistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: bigint)Execution mode: vectorizedReducer 2 Reduce Operator Tree:Group By Operatoraggregations: count(VALUE._col0)keys: KEY._col0 (type: string)mode: mergepartialoutputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 884000 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: _col0 (type: string), _col1 (type: bigint)outputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 884000 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 4000 Data size: 884000 Basic stats: COMPLETE Column stats: NONEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Time taken: 0.936 seconds, Fetched: 57 row(s)Stats & Cost Based Optimization (CBO)
Cost Based Optimization(CBO) engine uses statistics within Hive tables to produce optimal query plans.
hive> desc hvac; OK date string time string targettemp bigint actualtemp bigint system bigint systemage bigint buildingid bigint Time taken: 1.951 seconds, Fetched: 7 row(s) hive> explain select buildingid, max(targettemp-actualtemp) from hvac group by buildingid; OK STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 is a root stageSTAGE PLANS:Stage: Stage-1TezEdges:Reducer 2 <- Map 1 (SIMPLE_EDGE)DagName: root_20141030003838_da1cb621-bd61-4533-b131-fe88a004fb9b:9Vertices:Map 1 Map Operator Tree:TableScanalias: hvacStatistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: buildingid (type: bigint), targettemp (type: bigint), actualtemp (type: bigint)outputColumnNames: buildingid, targettemp, actualtempStatistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONEGroup By Operatoraggregations: max((targettemp - actualtemp))keys: buildingid (type: bigint)mode: hashoutputColumnNames: _col0, _col1Statistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: bigint)sort order: +Map-reduce partition columns: _col0 (type: bigint)Statistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: bigint)Reducer 2 Reduce Operator Tree:Group By Operatoraggregations: max(VALUE._col0)keys: KEY._col0 (type: bigint)mode: mergepartialoutputColumnNames: _col0, _col1Statistics: Num rows: 5011 Data size: 120265 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: _col0 (type: bigint), _col1 (type: bigint)outputColumnNames: _col0, _col1Statistics: Num rows: 5011 Data size: 120265 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 5011 Data size: 120265 Basic stats: COMPLETE Column stats: NONEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Time taken: 0.374 seconds, Fetched: 56 row(s) hive> analyze table hvac compute STATISTICS; Query ID = root_20141030003939_a34fc478-076b-4563-9f6a-f1c4c8703acf Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established.Status: Running (application id: application_1414630785017_0020)Map 1: -/- Map 1: -/- Map 1: -/- Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 1/1 Status: Finished successfully Table default.hvac stats: [numFiles=1, numRows=8000, totalSize=240531, rawDataSize=232532] OK Time taken: 89.063 seconds hive> ANALYZE TABLE hvac COMPUTE STATISTICS FOR COLUMNS targettemp,actualtemp,buildingid; Query ID = root_20141030004343_cff30a2c-aa1c-48ff-af7c-ba92a4718693 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0020)Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK Time taken: 26.089 seconds hive> hive.compute.query.using.stats = true; NoViableAltException(26@[])at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:999)at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:408)at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:976)at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1041)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:0 cannot recognize input near 'hive' '.' 'compute' hive> set hive.compute.query.using.stats = true; hive> set hive.stats.fetch.column.stats = true; hive> set hive.stats.fetch.partition.stats = true; hive> set hive.cbo.enable = true; hive> explain select buildingid, max(targettemp-actualtemp) from hvac group by buildingid; OK STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 is a root stageSTAGE PLANS:Stage: Stage-1TezEdges:Reducer 2 <- Map 1 (SIMPLE_EDGE)DagName: root_20141030004545_b1fff813-64cb-4254-a526-9d492c1f0cfd:12Vertices:Map 1 Map Operator Tree:TableScanalias: hvacStatistics: Num rows: 8000 Data size: 232532 Basic stats: COMPLETE Column stats: COMPLETESelect Operatorexpressions: buildingid (type: bigint), targettemp (type: bigint), actualtemp (type: bigint)outputColumnNames: buildingid, targettemp, actualtempStatistics: Num rows: 8000 Data size: 232532 Basic stats: COMPLETE Column stats: COMPLETEGroup By Operatoraggregations: max((targettemp - actualtemp))keys: buildingid (type: bigint)mode: hashoutputColumnNames: _col0, _col1Statistics: Num rows: 8000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEReduce Output Operatorkey expressions: _col0 (type: bigint)sort order: +Map-reduce partition columns: _col0 (type: bigint)Statistics: Num rows: 8000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEvalue expressions: _col1 (type: bigint)Reducer 2 Reduce Operator Tree:Group By Operatoraggregations: max(VALUE._col0)keys: KEY._col0 (type: bigint)mode: mergepartialoutputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 32000 Basic stats: COMPLETE Column stats: COMPLETESelect Operatorexpressions: _col0 (type: bigint), _col1 (type: bigint)outputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEFile Output Operatorcompressed: falseStatistics: Num rows: 4000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Time taken: 3.874 seconds, Fetched: 56 row(s) hive> select buildingid, max(targettemp-actualtemp) from hvac group by buildingid; Query ID = root_20141030004545_c7a5f1a2-4f62-4a72-b36e-7e3d7f9c0673 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0020)Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 1 14 2 15 3 15 4 15 5 15 6 15 7 14 8 15 9 14 10 15 11 14 12 15 13 15 14 15 15 15 16 15 17 15 18 15 19 15 20 15 Time taken: 56.247 seconds, Fetched: 20 row(s)總結
以上是生活随笔為你收集整理的Apache Hive on Apache Tez的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: centos 安装 telnet
- 下一篇: 关闭防火墙命令 500 OOPS: c