Hadoop 之Pig的安装的与配置之遇到的问题---待解决
1. 前提是hadoop集群已經(jīng)配置完成并且可以正常啟動(dòng);以下是我的配置方案:
首先配置vim /etc/hosts
192.168.1.64 xuegod64
192.168.1.65 xuegod65
192.168.1.63 xuegod63
(將配置好的文件拷貝到其他兩臺機(jī)器,我是在xuegod64上配置的,使用scp /etc/hosts xuegod63:/etc/進(jìn)行拷貝,進(jìn)行該步驟前提是已經(jīng)配置好SSH免密碼登錄;關(guān)于SSH免密碼登錄在此就不再詳說了)
2.準(zhǔn)備安裝包如下圖
[hadoop@xuegod64 ~]$ ls
hadoop-2.4.1.tar.gz
pig-0.15.0.tar.gz
jdk-8u66-linux-x64.rpm
zookeeper-3.4.7.tar.gz(可以不用)
3.配置/etc/profile
[hadoop@xuegod64 ~]$ vim /etc/profile #前提是使用root用戶將編輯此文件的權(quán)限賦予hadoop用戶
export JAVA_HOME=/usr/java/jdk1.8.0_66/
export HADOOP_HOME=/home/hadoop/hadoop-2.4.1/
export HBASE_HOME=/home/hadoop/hbase-1.1.2/
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.7/
export PIG_HOME=/home/hadoop/pig-0.15.0/
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOO
KEEPER_HOME/bin:$HBASE_HOME/bin:$PIG_HOME/bin:$PATH
4.檢驗(yàn)pig是否配置成功
[hadoop@xuegod64 ~]$ pig -help
Apache Pig version 0.15.0 (r1682971)
compiled Jun 01 2015, 11:44:35
USAGE: Pig [options] [-] : Run interactively in grunt shell.
Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
Pig [options] [-f[ile]] file : Run cmds found in file.
。
。
。
。
5.Pig執(zhí)行模式
Pig有兩種執(zhí)行模式,分別為:
1)本地模式(Local)本地模式下,Pig運(yùn)行在單一的JVM中,可訪問本地文件。該模式適用于處理小規(guī)模數(shù)據(jù)或?qū)W習(xí)之用。
運(yùn)行以下命名設(shè)置為本地模式:
pig –x local
2)MapReduce模式在MapReduce模式下,Pig將查詢轉(zhuǎn)換為MapReduce作業(yè)提交給Hadoop(可以說群集 ,也可以說偽分布式)。應(yīng)該檢查當(dāng)前Pig版本是否支持你當(dāng)前所用的Hadoop版本。某一版本的Pig僅支持特定版本的Hadoop,你可以通過訪問Pig官網(wǎng)獲取版本支持信息。
Pig會用到HADOOP_HOME環(huán)境變量。如果該變量沒有設(shè)置,Pig也可以利用自帶的Hadoop庫,但是這樣就無法保證其自帶肯定庫和你實(shí)際使用的HADOOP版本是否兼容,所以建議顯式設(shè)置HADOOP_HOME變量。且還需要設(shè)置如下變量:
export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop
Pig默認(rèn)模式是mapreduce,你也可以用以下命令進(jìn)行設(shè)置:
[hadoop@xuegod64 ~]$ pig -x mapreduce
(中間略)
grunt>
下一步,需要告訴Pig它所用Hadoop集群的Namenode和Jobtracker。一般情況下,正確安裝配置Hadoop后,這些配置信息就已經(jīng)可用了,不需要做額外的配置
6.運(yùn)行Pig程序
Pig程序執(zhí)行方式有三種
1) 腳本方式
直接運(yùn)行包含Pig腳本的文件,比如以下命令將運(yùn)行本地scripts.pig文件中的所有命令:
pig scripts.pig
2)Grunt方式
a) Grunt提供了交互式運(yùn)行環(huán)境,可以在命令行編輯執(zhí)行命令
b) Grund同時(shí)支持命令的歷史記錄,通過上下方向鍵訪問。
c) Grund支持命令的自動(dòng)補(bǔ)全功能。比如當(dāng)你輸入a = foreach b g時(shí),按下Tab鍵,則命令行自動(dòng)變成a = foreach b generate。你甚至可以自定義命令自動(dòng)補(bǔ)全功能的詳細(xì)方式。具體請參閱相關(guān)文檔。
3) 嵌入式方式
可以在java中運(yùn)行Pig程序,類似于使用JDBC運(yùn)行SQL程序
(不熟悉)
6.啟動(dòng)集群
[hadoop@xuegod64 ~]$ start-all.sh
[hadoop@xuegod64 ~]$ jps
4722 DataNode
5062 DFSZKFailoverController
5159 ResourceManager
4905 JournalNode
5321 Jps
4618 NameNode
2428 QuorumPeerMain
5279 NodeManager
[hadoop@xuegod64 ~]$ ssh xuegod63
Last login: Sat Jan 2 23:10:21 2016 from xuegod64
[hadoop@xuegod63 ~]$ jps
2130 QuorumPeerMain
3125 Jps
2982 NodeManager
2886 JournalNode
2795 DataNode
[hadoop@xuegod64 ~]$ ssh xuegod65
Last login: Sat Jan 2 15:11:33 2016 from xuegod64
[hadoop@xuegod65 ~]$ jps
3729 Jps
2401 QuorumPeerMain
3415 JournalNode
3484 DFSZKFailoverController
3325 DataNode
3583 NodeManager
3590 SecondNameNode
7.簡單示例
我們以查找最高氣溫為例,演示如何利用Pig統(tǒng)計(jì)每年的最高氣溫。假設(shè)數(shù)據(jù)文件內(nèi)容如下(每行一個(gè)記錄,tab分割)
以local模式進(jìn)入pig,依次輸入以下命令(注意以分號結(jié)束語句):
[hadoop@xuegod64 ~]$ pig -x local
grunt> records = load'/home/hadoop/zuigaoqiwen.txt'as(year:chararray,temperature:int);
2016-01-02 16:12:05,700 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2016-01-02 16:12:05,701 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> dump records;
(1930:28:1,)
(1930:0:1,)
(1930:22:1,)
(1930:22:1,)
(1930:22:1,)
(1930:22:1,)
(1930:28:1,)
(1930:0:1,)
(1930:0:1,)
(1930:0:1,)
(1930:11:1,)
(1930:0:1,)
。
。
。
(過程略)
grunt> describe records;
records: {year: chararray,temperature: int}
grunt> valid_records = filter records by temperature!=999;
grunt> grouped_records = group valid_records by year;
grunt> dump grouped_records;
grunt> describe grouped_records;
grouped_records: {group: chararray,valid_records: {(year: chararray,temperature: int)}}
grunt> grouped_records = group valid_records by year;
grunt> dump grouped_records;
.
.
2016-01-02 16:16:02,974 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1
Total Length = 7347344
Input split[0]:
Length = 7347344
ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit
Locations:
2016-01-02 16:16:08,011 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2016-01-02 16:16:08,012 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAFeatures
2.4.1 0.15.0 hadoop 2016-01-02 16:16:02 2016-01-02 16:16:08 GROUP_BY,FILTER
Success!
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTime AvgMapTimMedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_local798558500_0002 1 1 n/a n/a n/a n/a n/a n/a n/a n/a grouped_records,records,valid_records GROUP_BY file:/tmp/temp-206603117/tmp-1002834084,
Input(s):
Successfully read 642291 records from: "/home/hadoop/zuigaoqiwen.txt"
Output(s):
Successfully stored 0 records in: "file:/tmp/temp-206603117/tmp-1002834084"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_local798558500_0002
grunt> describe grouped_records;
grouped_records: {group: chararray,valid_records: {(year: chararray,temperature: int)}}
grunt> max_temperature = foreach grouped_records generate group,MAX(valid_records.temperature);
grunt> dump max_temperature;
(1990,23)
(1991,21)
(1992,30)
grunt> quit
2016-01-02 16:24:25,303 [main] INFO org.apache.pig.Main - Pig script completed in 14 minutes, 27 seconds and 123 milliseconds (867123 ms)
中間有些問題,搞不定:
錯(cuò)誤提示:
2016-01-02 16:18:28,049 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2016-01-02 16:18:28,050 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2016-01-02 16:18:28,050 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
2016-01-02 16:18:28,055 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Encountered Warning ACCESSING_NON_EXISTENT_FIELD 642291 time(s).
2016-01-02 16:18:28,055 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2016-01-02 16:18:28,055 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2016-01-02 16:18:28,056 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS
2016-01-02 16:18:28,056 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2016-01-02 16:18:28,246 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2016-01-02 16:18:28,246 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
錯(cuò)誤日志:
at java.lang.reflect.Method.invoke(Method.java:497)
Pig Stack Trace
at org.apache.pig.tools.grunt.GruntParser.processPig(Grunt
Parser.java:1082)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.p
arse(PigScriptParser.java:505)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError
(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError
(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:565)
at org.apache.pig.Main.main(Main.java:177)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Met
hod)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMetho
dAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Delegat
ingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
==============
轉(zhuǎn)載于:https://www.cnblogs.com/zd520pyx1314/p/6534005.html
總結(jié)
以上是生活随笔為你收集整理的Hadoop 之Pig的安装的与配置之遇到的问题---待解决的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 数据库的定义、关系型数据库的四种约束。。
- 下一篇: 08、求x的y的幂次方的最后3位数——循