pig安装应用
文章目錄
- pig安裝
- 基本應用
- bug
- 數據集運算
pig安裝
1、客戶端主機安裝軟件并解壓
2、修改參數
hadoop@ddai-desktop:~$ cd /opt/pig-0.17.0/ hadoop@ddai-desktop:/opt/pig-0.17.0$ cd conf/ hadoop@ddai-desktop:/opt/pig-0.17.0/conf$ mv log4j.properties.template log4j.properties hadoop@ddai-desktop:/opt/pig-0.17.0/conf$ vim pig.properties pig.logfile=/opt/pig-0.17.0/logs log4jconf=/opt/pig-0.17.0/conf/log4j.propertiesexectype=mapreduce3、修改環境變量并生效
hadoop@ddai-desktop:~$ vim /home/hadoop/.profile export PIG_HOME=/opt/pig-0.17.0 export PATH=$PATH:$PIG_HOME/bin hadoop@ddai-desktop:~$ source /home/hadoop/.profile運行pig
1、主節點運行hadoop服務
2、客戶端主機啟動pig
基本應用
(1)創建test目錄,上傳到hdfs
(2)裝載A.txt到變量a,變量b為a的列$0+列$1
grunt> a = load '/test/A.txt' using PigStorage(',') as (c1:int,c2:double,c3:float); grunt> b = foreach a generate $0+$1 as b1;604000 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s). 21/08/13 16:29:47 WARN newplan.BaseOperatorPlan: Encountered Warning IMPLICIT_CAST_TO_DOUBLE 1 time(s).grunt> dump b; (1.0) (4.0) grunt> describe b; b: {b1: double}
(3)變量c為b的b1列減去1
(4)變量d為a的第1列,是0輸出(c1,c2),不是0輸出(c1,c3)
(5)變量f為a的c1>0并且c2>1的輸出
(6)裝載Tuple數據TP.txt到變量tp,變量g為tp產生的輸出
(7)對g進行分組,輸出Bag數據到變量bg
(8)裝載Map數據MP.txt到變量mp,變量h為mp產生的輸出
bug
bin/hadoop dfsadmin -safemode leave //在bin下執行 //若配置環境變量,使用以下命令 hadoop dfsadmin -safemode leave解決后
數據集運算
(1)加載數據
grunt> a = load '/test/A.txt' using PigStorage(',') as (a1:int, a2:int, a3:int); grunt> b = load '/test/B.txt' using PigStorage(',') as (b1:int, b2:int, b3:int);(2)a與b并集
grunt> c = union a, b; grunt> dump c;
(3)將c分割為d和e,其中d的第一列數據值為0,e的第一列的數據為1($0表示數據集的第一列)
(4)選擇c中的一部分數據
(5)對數據進行分組
(6)將所有的元素集合到一起
(7)查看h中元素個數
grunt> i = foreach h generate COUNT($1); grunt> dump i;
(8)連表查詢,條件是a.$2 == b.$2
(9)變量k為c的$1和$1 * $2的輸出
grunt> k = foreach c generate $1, $1 * $2; grunt> dump k;總結
- 上一篇: 网页浏览器比较
- 下一篇: ET大脑,打通产业升级智能之路