當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Apache Hive on Apache Tez

發布時間：2024/9/30 编程问答 31 豆豆

生活随笔收集整理的這篇文章主要介紹了 Apache Hive on Apache Tez 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

Tez是Apache最新開源的支持DAG作業的計算框架，它直接源于MapReduce框架，核心思想是將Map和Reduce兩個操作進一步拆分，即Map被拆分成Input、Processor、Sort、Merge和Output， Reduce被拆分成Input、Shuffle、Sort、Merge、Processor和Output等，這樣，這些分解后的元操作可以任意靈活組合，產生新的操作，這些操作經過一些控制程序組裝后，可形成一個大的DAG作業。總結起來，Tez有以下特點：
（1）Apache二級開源項目（源代碼今天發布的）
（2）運行在YARN之上
（3）適用于DAG（有向圖）應用（同Impala、Dremel和Drill一樣，可用于替換Hive/Pig等）
其中，第三點需要做一些簡單的說明，Apache當前有頂級項目Oozie用于DAG作業設計，但Oozie是比較高層（作業層面）的，它只是提供了一種多類型作業（比如MR程序、Hive、Pig等）依賴關系表達方式，并按照這種依賴關系提交這些作業，而Tez則不同，它在更底層提供了DAG編程接口，用戶編寫程序時直接采用這些接口進行程序設計，這種更底層的編程方式會帶來更高的效率.
詳細介紹：http://dongxicheng.org/mapreduce-nextgen/apache-tez-newest-progress/

Tez有以下幾個特色：
（1）豐富的數據流（dataflow，NOT Streaming！）編程接口；
（2）擴展性良好的“Input-Processor-Output”運行模型；
（3）簡化數據部署（充分利用了YARN框架，Tez本身僅是一個客戶端編程庫，無需事先部署相關服務）
（4）性能優于MapReduce
（5） ?優化的資源管理（直接運行在資源管理系統YARN之上）

（6）動態生成物理數據流（dataflow）

操作日志：
http://zh.hortonworks.com/hadoop-tutorial/supercharging-interactive-queries-hive-tez/

[root@sandbox ~]# hiveLogging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties hive> set hive.execution.engine=mr; hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid; Query ID = root_20141030000808_c3d3772d-3aa0-4673-a28e-515043452d7a Total jobs = 1 14/10/30 00:08:46 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-08-32_847_2342871984031221118-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/10/30 00:08:46 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-08-32_847_2342871984031221118-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. Execution log at: /tmp/root/root_20141030000808_c3d3772d-3aa0-4673-a28e-515043452d7a.log 2014-10-30 12:08:48 Starting to launch local task to process map join; maximum memory = 260177920 2014-10-30 12:08:52 Dump the side-table into file: file:/tmp/root/hive_2014-10-30_00-08-32_847_2342871984031221118-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile00--.hashtable 2014-10-30 12:08:52 Uploaded 1 File to: file:/tmp/root/hive_2014-10-30_00-08-32_847_2342871984031221118-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile00--.hashtable (1044 bytes) 2014-10-30 12:08:52 End of local task; Time Taken: 3.833 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1414630785017_0012, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0012/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414630785017_0012 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2014-10-30 00:09:13,234 Stage-3 map = 0%, reduce = 0% 2014-10-30 00:09:21,143 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.91 sec MapReduce Total cumulative CPU time: 1 seconds 910 msec Ended Job = job_1414630785017_0012 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.91 sec HDFS Read: 240754 HDFS Write: 416262 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 910 msec OK 6/1/13 0:00:01 66 58 13 20 4 Finland GG1919 17 M4 ...... 6/17/13 2:33:07 68 72 17 27 12 Finland FN39TG 26 M12 6/18/13 3:33:07 68 69 10 4 3 Brazil JDNS77 28 M3 6/19/13 4:33:07 65 63 7 23 20 Argentina ACMAX22 19 M20 6/20/13 5:33:07 66 66 9 21 3 Brazil JDNS77 28 M3 Time taken: 51.333 seconds, Fetched: 8000 row(s) hive> set hive.execution.engine=tez; hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid; Query ID = root_20141030001010_9b9e76a3-5da9-48b6-9bd6-302c6560aa79 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0013)Map 1: -/- Map 2: -/- Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 1/1 Map 2: 0/1 Map 1: 1/1 Map 2: 1/1 Status: Finished successfully OK 6/1/13 0:00:01 66 58 13 20 4 Finland GG1919 17 M4 ...... 6/17/13 2:33:07 68 72 17 27 12 Finland FN39TG 26 M12 6/18/13 3:33:07 68 69 10 4 3 Brazil JDNS77 28 M3 6/19/13 4:33:07 65 63 7 23 20 Argentina ACMAX22 19 M20 6/20/13 5:33:07 66 66 9 21 3 Brazil JDNS77 28 M3 Time taken: 52.979 seconds, Fetched: 8000 row(s)
hive> select a.buildingid, b.buildingmgr, max(a.targettemp-a.actualtemp)> from hvac a join building b> on a.buildingid = b.buildingid> group by a.buildingid, b.buildingmgr limit 10; Query ID = root_20141030001616_e16c39f3-8cc6-4809-96f1-c76a55543a72 Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established.Status: Running (application id: application_1414630785017_0014)Map 1: -/- Map 2: -/- Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 0/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 1/1 Map 2: 0/1 Reducer 3: 0/1 Map 1: 1/1 Map 2: 1/1 Reducer 3: 0/1 Map 1: 1/1 Map 2: 1/1 Reducer 3: 1/1 Status: Finished successfully OK 1 M1 14 2 M2 15 3 M3 15 4 M4 15 5 M5 15 6 M6 15 7 M7 14 8 M8 15 9 M9 14 10 M10 15 Time taken: 37.644 seconds, Fetched: 10 row(s) hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid limit 10; Query ID = root_20141030001919_7ef87e96-9848-404b-9f7c-2e847efbda72 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0014)Map 1: -/- Map 2: -/- Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 0/1 Map 2: 0/1 Map 1: 1/1 Map 2: 0/1 Map 1: 1/1 Map 2: 1/1 Status: Finished successfully OK 6/1/13 0:00:01 66 58 13 20 4 Finland GG1919 17 M4 6/2/13 1:00:01 69 68 3 20 17 Egypt FN39TG 11 M17 6/3/13 2:00:01 70 73 17 20 18 Indonesia JDNS77 25 M18 6/4/13 3:00:01 67 63 2 23 15 Israel ACMAX22 19 M15 6/5/13 4:00:01 68 74 16 9 3 Brazil JDNS77 28 M3 6/6/13 5:00:01 67 56 13 28 4 Finland GG1919 17 M4 6/7/13 6:00:01 70 58 12 24 2 France FN39TG 27 M2 6/8/13 7:00:01 70 73 20 26 16 Turkey AC1000 23 M16 6/9/13 8:00:01 66 69 16 9 9 Mexico GG1919 11 M9 6/10/13 9:00:01 65 57 6 5 12 Finland FN39TG 26 M12 Time taken: 30.373 seconds, Fetched: 10 row(s)
hive> set hive.execution.engine=mr; hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid limit 10; Query ID = root_20141030002121_82b6c25d-3efe-4edb-97f0-6ee104a44793 Total jobs = 1 14/10/30 00:21:17 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/10/30 00:21:17 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. Execution log at: /tmp/root/root_20141030002121_82b6c25d-3efe-4edb-97f0-6ee104a44793.log 2014-10-30 12:21:20 Starting to launch local task to process map join; maximum memory = 260177920 2014-10-30 12:21:26 Dump the side-table into file: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile40--.hashtable 2014-10-30 12:21:26 Uploaded 1 File to: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile40--.hashtable (1044 bytes) 2014-10-30 12:21:26 End of local task; Time Taken: 6.094 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1414630785017_0015, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0015/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414630785017_0015 Hadoop job information for Stage-3: number of mappers: 0; number of reducers: 0 2014-10-30 00:21:40,956 Stage-3 map = 0%, reduce = 0% 2014-10-30 00:21:51,576 Stage-3 map = 100%, reduce = 0% Ended Job = job_1414630785017_0015 MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 6/1/13 0:00:01 66 58 13 20 4 Finland GG1919 17 M4 6/2/13 1:00:01 69 68 3 20 17 Egypt FN39TG 11 M17 6/3/13 2:00:01 70 73 17 20 18 Indonesia JDNS77 25 M18 6/4/13 3:00:01 67 63 2 23 15 Israel ACMAX22 19 M15 6/5/13 4:00:01 68 74 16 9 3 Brazil JDNS77 28 M3 6/6/13 5:00:01 67 56 13 28 4 Finland GG1919 17 M4 6/7/13 6:00:01 70 58 12 24 2 France FN39TG 27 M2 6/8/13 7:00:01 70 73 20 26 16 Turkey AC1000 23 M16 6/9/13 8:00:01 66 69 16 9 9 Mexico GG1919 11 M9 6/10/13 9:00:01 65 57 6 5 12 Finland FN39TG 26 M12 Time taken: 45.935 seconds, Fetched: 10 row(s) hive> select a.buildingid, b.buildingmgr, max(a.targettemp-a.actualtemp)> from hvac a join building b> on a.buildingid = b.buildingid> group by a.buildingid, b.buildingmgr limit 10; Query ID = root_20141030002323_041fbe30-9942-4a05-b307-425a8865fe30 Total jobs = 1 14/10/30 00:23:16 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10007/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring. 14/10/30 00:23:16 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10007/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring. Execution log at: /tmp/root/root_20141030002323_041fbe30-9942-4a05-b307-425a8865fe30.log 2014-10-30 12:23:17 Starting to launch local task to process map join; maximum memory = 260177920 2014-10-30 12:23:19 Dump the side-table into file: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile51--.hashtable 2014-10-30 12:23:19 Uploaded 1 File to: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile51--.hashtable (714 bytes) 2014-10-30 12:23:19 End of local task; Time Taken: 1.254 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number> In order to set a constant number of reducers:set mapreduce.job.reduces=<number> Starting Job = job_1414630785017_0016, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0016/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414630785017_0016 Hadoop job information for Stage-2: number of mappers: 0; number of reducers: 0 2014-10-30 00:23:27,599 Stage-2 map = 0%, reduce = 0% 2014-10-30 00:23:37,062 Stage-2 map = 100%, reduce = 0% 2014-10-30 00:23:44,578 Stage-2 map = 100%, reduce = 100% Ended Job = job_1414630785017_0016 MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 1 M1 14 2 M2 15 3 M3 15 4 M4 15 5 M5 15 6 M6 15 7 M7 14 8 M8 15 9 M9 14 10 M10 15 Time taken: 32.685 seconds, Fetched: 10 row(s)

Query Vectorization

Step 1:

hive> create table hvac_orc stored as orc as select * from hvac; Query ID = root_20141030002727_5b56b417-f2d7-405c-bb2b-48a74971fa14 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1414630785017_0017, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0017/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414630785017_0017 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2014-10-30 00:27:56,573 Stage-1 map = 0%, reduce = 0% 2014-10-30 00:28:08,218 Stage-1 map = 100%, reduce = 0% Ended Job = job_1414630785017_0017 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to: hdfs://sandbox.hortonworks.com:8020/tmp/hive-root/hive_2014-10-30_00-27-32_394_2850138279224601231-1/-ext-10001 Moving data to: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/hvac_orc Table default.hvac_orc stats: [numFiles=1, numRows=8000, totalSize=26593, rawDataSize=1768000] MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK Time taken: 37.248 seconds

Step 2:

Run the following statement to enable Tez.

hive> set hive.execution.engine=tez;

Step 3:

hive> select date, count(buildingid) from hvac group by date; Query ID = root_20141030003131_d3ff96e0-a7d7-42b2-ab4a-bcfa6507a5dd Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0019)Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 6/1/13 267 6/10/13 267 6/11/13 267 6/12/13 267 6/13/13 267 6/14/13 267 6/15/13 267 6/16/13 267 6/17/13 267 6/18/13 267 6/19/13 267 6/2/13 267 6/20/13 267 6/21/13 266 6/22/13 266 6/23/13 266 6/24/13 266 6/25/13 266 6/26/13 266 6/27/13 266 6/28/13 266 6/29/13 266 6/3/13 267 6/30/13 266 6/4/13 267 6/5/13 267 6/6/13 267 6/7/13 267 6/8/13 267 6/9/13 267 Time taken: 25.633 seconds, Fetched: 30 row(s)

Step 4:

hive> select date, count(buildingid) from hvac_orc group by date; Query ID = root_20141030003131_fcf22ec7-ff7f-4a01-8316-ad07929929d8 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0019)Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 6/1/13 267 6/10/13 267 6/11/13 267 6/12/13 267 6/13/13 267 6/14/13 267 6/15/13 267 6/16/13 267 6/17/13 267 6/18/13 267 6/19/13 267 6/2/13 267 6/20/13 267 6/21/13 266 6/22/13 266 6/23/13 266 6/24/13 266 6/25/13 266 6/26/13 266 6/27/13 266 6/28/13 266 6/29/13 266 6/3/13 267 6/30/13 266 6/4/13 267 6/5/13 267 6/6/13 267 6/7/13 267 6/8/13 267 6/9/13 267 Time taken: 10.683 seconds, Fetched: 30 row(s)

Step 5:

Now let’s run the following steps to enable vectorization:

set hive.vectorized.execution.enabled; hive> set hive.vectorized.execution.enabled; hive.vectorized.execution.enabled=true

and then run the sql query from previous step

select date, count(buildingid) from hvac_orc group by date;

hive> select date, count(buildingid) from hvac_orc group by date; Query ID = root_20141030003232_da026977-69c2-4dd8-8648-56c73434b4d0 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0019)Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 6/1/13 267 6/10/13 267 6/11/13 267 6/12/13 267 6/13/13 267 6/14/13 267 6/15/13 267 6/16/13 267 6/17/13 267 6/18/13 267 6/19/13 267 6/2/13 267 6/20/13 267 6/21/13 266 6/22/13 266 6/23/13 266 6/24/13 266 6/25/13 266 6/26/13 266 6/27/13 266 6/28/13 266 6/29/13 266 6/3/13 267 6/30/13 266 6/4/13 267 6/5/13 267 6/6/13 267 6/7/13 267 6/8/13 267 6/9/13 267 Time taken: 6.904 seconds, Fetched: 30 row(s)

Step 6:

Let’s look at the ‘explain’ plan to confirm that it is indeed using a vectorized query plan:

explain select date, count(buildingid) from hvac_orc group by date; hive> explain select date, count(buildingid) from hvac_orc group by date; OK STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 is a root stageSTAGE PLANS:Stage: Stage-1TezEdges:Reducer 2 <- Map 1 (SIMPLE_EDGE)DagName: root_20141030003333_eef5ffbf-8427-4d9b-b4be-4a8994926a77:8Vertices:Map 1 Map Operator Tree:TableScanalias: hvac_orcStatistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: date (type: string), buildingid (type: bigint)outputColumnNames: date, buildingidStatistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONEGroup By Operatoraggregations: count(buildingid)keys: date (type: string)mode: hashoutputColumnNames: _col0, _col1Statistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: string)sort order: +Map-reduce partition columns: _col0 (type: string)Statistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: bigint)Execution mode: vectorizedReducer 2 Reduce Operator Tree:Group By Operatoraggregations: count(VALUE._col0)keys: KEY._col0 (type: string)mode: mergepartialoutputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 884000 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: _col0 (type: string), _col1 (type: bigint)outputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 884000 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 4000 Data size: 884000 Basic stats: COMPLETE Column stats: NONEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Time taken: 0.936 seconds, Fetched: 57 row(s)

Stats & Cost Based Optimization (CBO)

Cost Based Optimization(CBO) engine uses statistics within Hive tables to produce optimal query plans.

hive> desc hvac; OK date string time string targettemp bigint actualtemp bigint system bigint systemage bigint buildingid bigint Time taken: 1.951 seconds, Fetched: 7 row(s) hive> explain select buildingid, max(targettemp-actualtemp) from hvac group by buildingid; OK STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 is a root stageSTAGE PLANS:Stage: Stage-1TezEdges:Reducer 2 <- Map 1 (SIMPLE_EDGE)DagName: root_20141030003838_da1cb621-bd61-4533-b131-fe88a004fb9b:9Vertices:Map 1 Map Operator Tree:TableScanalias: hvacStatistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: buildingid (type: bigint), targettemp (type: bigint), actualtemp (type: bigint)outputColumnNames: buildingid, targettemp, actualtempStatistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONEGroup By Operatoraggregations: max((targettemp - actualtemp))keys: buildingid (type: bigint)mode: hashoutputColumnNames: _col0, _col1Statistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: bigint)sort order: +Map-reduce partition columns: _col0 (type: bigint)Statistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: bigint)Reducer 2 Reduce Operator Tree:Group By Operatoraggregations: max(VALUE._col0)keys: KEY._col0 (type: bigint)mode: mergepartialoutputColumnNames: _col0, _col1Statistics: Num rows: 5011 Data size: 120265 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: _col0 (type: bigint), _col1 (type: bigint)outputColumnNames: _col0, _col1Statistics: Num rows: 5011 Data size: 120265 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 5011 Data size: 120265 Basic stats: COMPLETE Column stats: NONEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Time taken: 0.374 seconds, Fetched: 56 row(s) hive> analyze table hvac compute STATISTICS; Query ID = root_20141030003939_a34fc478-076b-4563-9f6a-f1c4c8703acf Total jobs = 1 Launching Job 1 out of 1 Tez session was closed. Reopening... Session re-established.Status: Running (application id: application_1414630785017_0020)Map 1: -/- Map 1: -/- Map 1: -/- Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 0/1 Map 1: 1/1 Status: Finished successfully Table default.hvac stats: [numFiles=1, numRows=8000, totalSize=240531, rawDataSize=232532] OK Time taken: 89.063 seconds hive> ANALYZE TABLE hvac COMPUTE STATISTICS FOR COLUMNS targettemp,actualtemp,buildingid; Query ID = root_20141030004343_cff30a2c-aa1c-48ff-af7c-ba92a4718693 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0020)Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK Time taken: 26.089 seconds hive> hive.compute.query.using.stats = true; NoViableAltException(26@[])at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:999)at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:408)at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:976)at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1041)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:0 cannot recognize input near 'hive' '.' 'compute' hive> set hive.compute.query.using.stats = true; hive> set hive.stats.fetch.column.stats = true; hive> set hive.stats.fetch.partition.stats = true; hive> set hive.cbo.enable = true; hive> explain select buildingid, max(targettemp-actualtemp) from hvac group by buildingid; OK STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 is a root stageSTAGE PLANS:Stage: Stage-1TezEdges:Reducer 2 <- Map 1 (SIMPLE_EDGE)DagName: root_20141030004545_b1fff813-64cb-4254-a526-9d492c1f0cfd:12Vertices:Map 1 Map Operator Tree:TableScanalias: hvacStatistics: Num rows: 8000 Data size: 232532 Basic stats: COMPLETE Column stats: COMPLETESelect Operatorexpressions: buildingid (type: bigint), targettemp (type: bigint), actualtemp (type: bigint)outputColumnNames: buildingid, targettemp, actualtempStatistics: Num rows: 8000 Data size: 232532 Basic stats: COMPLETE Column stats: COMPLETEGroup By Operatoraggregations: max((targettemp - actualtemp))keys: buildingid (type: bigint)mode: hashoutputColumnNames: _col0, _col1Statistics: Num rows: 8000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEReduce Output Operatorkey expressions: _col0 (type: bigint)sort order: +Map-reduce partition columns: _col0 (type: bigint)Statistics: Num rows: 8000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEvalue expressions: _col1 (type: bigint)Reducer 2 Reduce Operator Tree:Group By Operatoraggregations: max(VALUE._col0)keys: KEY._col0 (type: bigint)mode: mergepartialoutputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 32000 Basic stats: COMPLETE Column stats: COMPLETESelect Operatorexpressions: _col0 (type: bigint), _col1 (type: bigint)outputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEFile Output Operatorcompressed: falseStatistics: Num rows: 4000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Time taken: 3.874 seconds, Fetched: 56 row(s) hive> select buildingid, max(targettemp-actualtemp) from hvac group by buildingid; Query ID = root_20141030004545_c7a5f1a2-4f62-4a72-b36e-7e3d7f9c0673 Total jobs = 1 Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0020)Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 1 14 2 15 3 15 4 15 5 15 6 15 7 14 8 15 9 14 10 15 11 14 12 15 13 15 14 15 15 15 16 15 17 15 18 15 19 15 20 15 Time taken: 56.247 seconds, Fetched: 20 row(s)

總結

以上是生活随笔為你收集整理的Apache Hive on Apache Tez的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： centos 安装 telnet
下一篇：关闭防火墙命令 500 OOPS: c