當前位置：首頁 > 编程资源 > 综合教程 >内容正文

综合教程

使用HDP快速搭建Hadoop开发环境 | Debugo

發布時間：2023/12/29 综合教程 21 生活家

生活随笔收集整理的這篇文章主要介紹了使用HDP快速搭建Hadoop开发环境 | Debugo 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本文簡單記錄了一下使用VMware workstation 10、CentOS和HDP 2.0.6(Hadoop 2.2)發行版構建Hadoop開發測試環境的全部流程。這個過程中我遇到了不少問題，也耽誤了不少的時間，所以將此文奉上，希望對大家有所幫助。
本文使用兩臺虛擬機搭建真實集群環境，操作系統為Cent OS 6.5?？梢允褂肰Mware Workstation的簡易安裝模式來進行。

0. 安裝CentOS 6.5虛擬機

根據向導設置系統用戶、CPU、內存、磁盤和網絡。這里為了讓yum能連接互聯網，需要選擇橋接模式。

然后等待安裝結束（使用SSD硬盤不到10分鐘），這個過程會自動安裝VMware Tools。下面正式開始配置系統和HDP。

1. 服務器基本設置

vim /etc/hosts 192.168.1.210 hdp01 192.168.1.220 hdp02 vim /etc/selinux/config SELINUX=disabled vim /etc/sysconfig/network HOSTNAME=hdp01 #主機名分別為hdp01, hdp02

1234567

vim /etc/hosts192.168.1.210 hdp01192.168.1.220 hdp02vim /etc/selinux/configSELINUX=disabledvim /etc/sysconfig/networkHOSTNAME=hdp01 #主機名分別為hdp01, hdp02

關閉不必要的服務:

chkconfig NetworkManager off chkconfig abrt-ccpp off chkconfig abrtd off chkconfig acpid off chkconfig atd off chkconfig bluetooth off chkconfig cpuspeed off chkconfig cpuspeed off chkconfig ip6tables off chkconfig iptables off chkconfig netconsole off chkconfig netfs off chkconfig postfix off chkconfig restorecond off chkconfig httpd off

123456789101112131415

chkconfig NetworkManager offchkconfig abrt-ccpp offchkconfig abrtd offchkconfig acpid offchkconfig atd off chkconfig bluetooth offchkconfig cpuspeed offchkconfig cpuspeed offchkconfig ip6tables offchkconfig iptables offchkconfig netconsole offchkconfig netfs offchkconfig postfix offchkconfig restorecond offchkconfig httpd off

完成后重啟一下。

2．在hdp01上安裝ambari

(1).下載HDP repo
下載HDP提供的yum repo文件并拷貝到/etc/yum.repos.d中

[root@hdp01 ~]# wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.61/ambari.repo --2014-03-10 04:57:58-- http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.61/ambari.repoResolving public-repo-1.hortonworks.com... 54.230.127.224, 205.251.212.150, 54.230.124.207, ... Connecting to public-repo-1.hortonworks.com|54.230.127.224|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 770 [binary/octet-stream] Saving to: “ambari.repo” 100%[======================================>] 770 --.-K/s in 0s 2014-03-10 04:58:01 (58.8 MB/s) - “ambari.repo” saved [770/770] [root@hdp01 ~]# cp ambari.repo /etc/yum.repos.d/ （2）.使用yum安裝ambari-server [root@hdp01 ~]# yum –y install ambari-server ... Total download size: 49 M Installed size: 113 M .... Installed: ambari-server.noarch 0:1.4.1.61-1 Dependency Installed: postgresql.x86_64 0:8.4.20-1.el6_5 postgresql-libs.x86_64 0:8.4.20-1.el6_5 postgresql-server.x86_64 0:8.4.20-1.el6_5 Complete!

1234567891011121314151617181920

[root@hdp01 ~]# wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.61/ambari.repo--2014-03-10 04:57:58--http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.4.1.61/ambari.repoResolving public-repo-1.hortonworks.com... 54.230.127.224, 205.251.212.150, 54.230.124.207, ...Connecting to public-repo-1.hortonworks.com|54.230.127.224|:80... connected.HTTP request sent, awaiting response... 200 OKLength: 770 [binary/octet-stream]Saving to: “ambari.repo”100%[======================================>] 770 --.-K/s in 0s2014-03-10 04:58:01 (58.8 MB/s) - “ambari.repo” saved [770/770][root@hdp01 ~]# cp ambari.repo /etc/yum.repos.d/（2）.使用yum安裝ambari-server[root@hdp01 ~]# yum –y install ambari-server...Total download size: 49 MInstalled size: 113 M....Installed:ambari-server.noarch 0:1.4.1.61-1Dependency Installed:postgresql.x86_64 0:8.4.20-1.el6_5postgresql-libs.x86_64 0:8.4.20-1.el6_5postgresql-server.x86_64 0:8.4.20-1.el6_5 Complete!

3．配置root用戶的ssh互信

分別在hdp01和hdp02生成key,再通過ssh-copy-id拷貝到hdp01和hdp02上去。

[root@hdp01 ~]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory ' /root/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /hroot/.ssh/id_rsa. ... [root@hdp02 .ssh]# ssh-copy-id hdp01 The authenticity of host 'hdp01 (192.168.1.210)' can't be established. RSA key fingerprint is 90:3b:db:2d:c4:34:49:03:e6:d7:cc:cb:b7:60:4d:d0. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'hdp01,192.168.1.210' (RSA) to the list of known hosts. root@hdp01's password: Now try logging into the machine, with "ssh 'hdp01'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting. [root@hdp02 .ssh]# ssh-copy-id hdp02 The authenticity of host 'hdp02 (192.168.1.220)' can't be established. RSA key fingerprint is 11:cb:c9:9e:b6:c0:a1:95:98:fa:42:aa:95:5f:cf:98. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'hdp02,192.168.1.220' (RSA) to the list of known hosts. root@hdp02's password: Now try logging into the machine, with "ssh 'hdp02'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting.

123456789101112131415161718192021222324252627

[root@hdp01 ~]# ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/root/.ssh/id_rsa): Created directory ' /root/.ssh'.Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /hroot/.ssh/id_rsa....[root@hdp02 .ssh]# ssh-copy-id hdp01The authenticity of host 'hdp01 (192.168.1.210)' can't be established.RSA key fingerprint is 90:3b:db:2d:c4:34:49:03:e6:d7:cc:cb:b7:60:4d:d0.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'hdp01,192.168.1.210' (RSA) to the list of known hosts.root@hdp01's password: Now try logging into the machine, with "ssh 'hdp01'", and check in:.ssh/authorized_keysto make sure we haven't added extra keys that you weren't expecting.[root@hdp02 .ssh]# ssh-copy-id hdp02The authenticity of host 'hdp02 (192.168.1.220)' can't be established.RSA key fingerprint is 11:cb:c9:9e:b6:c0:a1:95:98:fa:42:aa:95:5f:cf:98.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'hdp02,192.168.1.220' (RSA) to the list of known hosts.root@hdp02's password: Now try logging into the machine, with "ssh 'hdp02'", and check in:.ssh/authorized_keysto make sure we haven't added extra keys that you weren't expecting.

4．配置ambari server

Apache Ambari是基于Web的Apache Hadoop的自動部署、管理和監控工具。這里ambari server的metastore使用了自帶了postgre數據庫。

[root@hdp01 ~]# ambari-server setup Using python /usr/bin/python2.6 Initializing... Setup ambari-server Checking SELinux... SELinux status is 'disabled' Customize user account for ambari-server daemon [y/n] (n)? Adjusting ambari-server permissions and ownership... Checking iptables... Checking JDK... To download the Oracle JDK you must accept the license terms found at http://www.oracle.com/technetwork/java/javase/terms/license/index.html and not accepting will cancel the Ambari Server setup. Do you accept the Oracle Binary Code License Agreement [y/n] (y)? Downloading JDK from http://public-repo-1.hortonworks.com/ARTIFACTS/jdk-6u31-linux-x64.bin to /var/lib/ambari-server/resources/jdk-6u31-linux-x64.bin JDK distribution size is 85581913 bytes dk-6u31-linux-x64.bin... 100% (81.6 MB of 81.6 MB) Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk-6u31-linux-x64.bin Installing JDK to /usr/jdk64 Successfully installed JDK to /usr/jdk64/jdk1.6.0_31 Downloading JCE Policy archive from http://public-repo-1.hortonworks.com/ARTIFACTS/jce_policy-6.zip to /var/lib/ambari-server/resources/jce_policy-6.zip Successfully downloaded JCE Policy archive to /var/lib/ambari-server/resources/jce_policy-6.zip Completing setup... Configuring database... Enter advanced database configuration [y/n] (n)? y ============================================================================== Choose one of the following options: [1] - PostgreSQL (Embedded) [2] - Oracle ============================================================================== Enter choice (1): 1 Database Name (ambari): Username (ambari): Enter Database Password (bigdata): Default properties detected. Using built-in database. Checking PostgreSQL... Running initdb: This may take upto a minute. About to start PostgreSQL Configuring local database... Connecting to the database. Attempt 1... Configuring PostgreSQL... Restarting PostgreSQL Ambari Server 'setup' completed successfully.

1234567891011121314151617181920212223242526272829303132333435363738394041

[root@hdp01 ~]# ambari-server setupUsing python/usr/bin/python2.6Initializing...Setup ambari-serverChecking SELinux...SELinux status is 'disabled'Customize user account for ambari-server daemon [y/n] (n)? Adjusting ambari-server permissions and ownership...Checking iptables...Checking JDK...To download the Oracle JDK you must accept the license terms found at http://www.oracle.com/technetwork/java/javase/terms/license/index.html and not accepting will cancel the Ambari Server setup.Do you accept the Oracle Binary Code License Agreement [y/n] (y)? Downloading JDK from http://public-repo-1.hortonworks.com/ARTIFACTS/jdk-6u31-linux-x64.bin to /var/lib/ambari-server/resources/jdk-6u31-linux-x64.binJDK distribution size is 85581913 bytesdk-6u31-linux-x64.bin... 100% (81.6 MB of 81.6 MB)Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk-6u31-linux-x64.binInstalling JDK to /usr/jdk64Successfully installed JDK to /usr/jdk64/jdk1.6.0_31Downloading JCE Policy archive from http://public-repo-1.hortonworks.com/ARTIFACTS/jce_policy-6.zip to /var/lib/ambari-server/resources/jce_policy-6.zipSuccessfully downloaded JCE Policy archive to /var/lib/ambari-server/resources/jce_policy-6.zipCompleting setup...Configuring database...Enter advanced database configuration [y/n] (n)? y==============================================================================Choose one of the following options:[1] - PostgreSQL (Embedded)[2] - Oracle==============================================================================Enter choice (1): 1Database Name (ambari): Username (ambari): Enter Database Password (bigdata): Default properties detected. Using built-in database.Checking PostgreSQL...Running initdb: This may take upto a minute.About to start PostgreSQLConfiguring local database...Connecting to the database. Attempt 1...Configuring PostgreSQL...Restarting PostgreSQLAmbari Server 'setup' completed successfully.

使用root用戶來啟動ambari server

[root@hdp01 ~]$ ambari-server start Using python /usr/bin/python2.6 Starting ambari-server Unable to check iptables status when starting without root privileges. Please do not forget to disable or adjust iptables if needed Unable to check PostgreSQL server status when starting without root privileges. Please do not forget to start PostgreSQL server. Server PID at: /var/run/ambari-server/ambari-server.pid Server out at: /var/log/ambari-server/ambari-server.out Server log at: /var/log/ambari-server/ambari-server.log Ambari Server 'start' completed successfully.

1234567891011

[root@hdp01 ~]$ ambari-server startUsing python/usr/bin/python2.6Starting ambari-serverUnable to check iptables status when starting without root privileges.Please do not forget to disable or adjust iptables if neededUnable to check PostgreSQL server status when starting without root privileges.Please do not forget to start PostgreSQL server.Server PID at: /var/run/ambari-server/ambari-server.pidServer out at: /var/log/ambari-server/ambari-server.outServer log at: /var/log/ambari-server/ambari-server.logAmbari Server 'start' completed successfully.

5．安裝mysql

使用mysql-server來存hive metastore。
首先安裝remi軟件源(為了能通過yum安裝Mysql 5.5):

[root@hdp01 ~]# yum install -y epel-release Installed: epel-release.noarch 0:6-8 Complete! [root@hdp01 ~]# rpm -Uvh http://rpms.famillecollet.com/enterprise/remi-release-6.rpm Retrieving http://rpms.famillecollet.com/enterprise/remi-release-6.rpm warning: /var/tmp/rpm-tmp.JSZuMv: Header V3 DSA/SHA1 Signature, key ID 00f97f56: NOKEY Preparing... ########################################### [100%] 1:remi-release ########################################### [100%] [root@hdp01 ~]# yum install –y mysql-server ...... Total download size: 12 M ...... [root@hdp01 ~]# yum --enablerepo=remi,remi-test list mysql mysql-server Loaded plugins: fastestmirror, refresh-packagekit, security Loading mirror speeds from cached hostfile ...... Available Packages mysql.x86_64 5.5.36-1.el6.remi mysql-server.x86_64 5.5.36-1.el6.remi [root@hdp01 ~]# yum --enablerepo=remi,remi-test install mysql mysql-server Loaded plugins: fastestmirror, refresh-packagekit, security Loading mirror speeds from cached hostfile ...... Total download size: 20 M ...... [root@hdp01 ~]# /usr/bin/mysql_secure_installation [root@hdp01 ~]# chkconfig --level 235 mysqld on [root@hdp01 ~]# /usr/bin/mysql_secure_installation ...... Enter current password for root (enter for none): OK, successfully used password, moving on... Change the root password? [Y/n] n ... skipping. Remove anonymous users? [Y/n] Y ... Success! Disallow root login remotely? [Y/n] Y ... Success! Remove test database and access to it? [Y/n] Y - Dropping test database... ... Success! - Removing privileges on test database... ... Success! Reload privilege tables now? [Y/n] Y ... Success! All done! If you've completed all of the above steps, your MySQL installation should now be secure. Thanks for using MySQL! [root@hdp01 ~]# service mysqld start Starting mysqld: [ OK ]

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051

[root@hdp01 ~]# yum install -y epel-releaseInstalled:epel-release.noarch 0:6-8Complete![root@hdp01 ~]# rpm -Uvh http://rpms.famillecollet.com/enterprise/remi-release-6.rpmRetrieving http://rpms.famillecollet.com/enterprise/remi-release-6.rpmwarning: /var/tmp/rpm-tmp.JSZuMv: Header V3 DSA/SHA1 Signature, key ID 00f97f56: NOKEYPreparing...########################################### [100%] 1:remi-release ########################################### [100%] [root@hdp01 ~]# yum install –y mysql-server......Total download size: 12 M......[root@hdp01 ~]# yum --enablerepo=remi,remi-test list mysql mysql-server Loaded plugins: fastestmirror, refresh-packagekit, securityLoading mirror speeds from cached hostfile......Available Packagesmysql.x86_64 5.5.36-1.el6.remi mysql-server.x86_645.5.36-1.el6.remi[root@hdp01 ~]# yum --enablerepo=remi,remi-test install mysql mysql-serverLoaded plugins: fastestmirror, refresh-packagekit, securityLoading mirror speeds from cached hostfile......Total download size: 20 M......[root@hdp01 ~]# /usr/bin/mysql_secure_installation[root@hdp01 ~]# chkconfig --level 235 mysqld on[root@hdp01 ~]# /usr/bin/mysql_secure_installation......Enter current password for root (enter for none): OK, successfully used password, moving on...Change the root password? [Y/n] n ... skipping.Remove anonymous users? [Y/n] Y ... Success!Disallow root login remotely? [Y/n] Y ... Success!Remove test database and access to it? [Y/n] Y - Dropping test database... ... Success! - Removing privileges on test database... ... Success!Reload privilege tables now? [Y/n] Y ... Success!All done!If you've completed all of the above steps, your MySQL installation should now be secure.Thanks for using MySQL! [root@hdp01 ~]# service mysqld startStarting mysqld: [OK]

下面創建數據庫和用戶

[root@hdp01 ~]# mysql –u root –p mysql> create database hive; Query OK, 1 row affected (0.00 sec) mysql> create user "hive" identified by "hive123"; Query OK, 0 rows affected (0.00 sec) mysql> grant all privileges on hive.* to hive; Query OK, 0 rows affected (0.00 sec) mysql> flush privileges; Query OK, 0 rows affected (0.00 sec)

123456789

[root@hdp01 ~]# mysql –u root –pmysql> create database hive;Query OK, 1 row affected (0.00 sec)mysql> create user "hive" identified by "hive123";Query OK, 0 rows affected (0.00 sec)mysql> grant all privileges on hive.* to hive;Query OK, 0 rows affected (0.00 sec)mysql> flush privileges;Query OK, 0 rows affected (0.00 sec)

6．使用瀏覽器打開, 輸入admin/admin

http://hdp01:8080/#/login

Name your cluster: debugo_test
Stack: HDP 2.0.6
Target Hosts: hdp01,hdp02
Host Registration Information:

由于之前配置了root用戶的ssh互信，這里需要選擇/root/.ssh下面id.rsa私鑰文件，然后Register and confirm繼續：
下面如果出現os_type_check.sh腳本執行失敗導致的Local OS is not compatible with cluster primary OS報錯，這是一個BUG，可以直接修改該os_type_check.sh使得輸出里面直接在輸出結果之前的RES=0。

成功后，ambari-agent 安裝完成，可以通過ambari-agent命令來控制：

[root@hdp02 Desktop]# ambari-agent status ambari-agent currently not running Usage: /usr/sbin/ambari-agent {start|stop|restart|status} #在hdp01和hdp02上讓ambari-agent在開機時啟動 [root@hdp02 Desktop]# chkconfig ambari-agent –level 35 on

12345

[root@hdp02 Desktop]# ambari-agent statusambari-agent currently not runningUsage: /usr/sbin/ambari-agent {start|stop|restart|status}#在hdp01和hdp02上讓ambari-agent在開機時啟動[root@hdp02 Desktop]# chkconfig ambari-agent –level 35 on

下一步選擇要安裝的組件，這里不選擇Nagios, Ganglia和Oozie。對于Hive，使用前面安裝的mysql-server:

另外將YARN的yarn.acl.enable設置為false。就進行下一步的Deploy了。這是一個極為漫長的過程，中途遇到failure就retry一下。大約一小時后安裝完成：

Next以后就進入了期待已久的Dashboard界面，此時安裝的組件已經全部啟動。

7．開發環境的配置

下載eclipse 4.3(kepler),maven-3.2.1到/opt下，設置環境變量

[root@hdp01 opt]# vim /etc/profile export JAVA_HOME=/usr/jdk64/jdk1.6.0_31 export MAVEN_HOME=/opt/apache-maven-3.2.1 export PATH=$PATH:$JAVA_HOME/bin:$MAVEN_HOME/bin export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar [root@hdp01 opt]# chgrp –R hadoop apache-maven-3.2.1/ eclipse/ workspace/ [root@hdp01 opt]# useradd hadoop [root@hdp01 opt]# echo “hadoop” > passwd –stdin hadoop

12345678

[root@hdp01 opt]# vim /etc/profileexport JAVA_HOME=/usr/jdk64/jdk1.6.0_31export MAVEN_HOME=/opt/apache-maven-3.2.1export PATH=$PATH:$JAVA_HOME/bin:$MAVEN_HOME/binexport CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar[root@hdp01 opt]# chgrp –R hadoop apache-maven-3.2.1/ eclipse/ workspace/[root@hdp01 opt]# useradd hadoop[root@hdp01 opt]# echo “hadoop” > passwd –stdin hadoop

打開eclipse -> help -> Install new softwares,下載maven插件( http://download.eclipse.org/m2e-wtp/releases/kepler/ )。安裝完成后重啟eclipse，就可以正式開始hadoop之旅了。

8. WordCount的編譯

(1). 新建一個maven項目

(2). Create a simple project(skip archetype selection)

(3). 如果出現JRE安裝相關的Warning
Build path specifies execution environment J2SE-1.5. There are no JREs installed in the workspace that are strictly compatible with this environment.
可以在項目properties頁中刪除JRE1.5SE這一項，然后Add Library -> JRE System Library -> workspace default JRE即可。

(4). WordCount.java
在com.debugo.com.mapred包下創建WordCount類：

package com.debugo.hadoop.mapred; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869

package com.debugo.hadoop.mapred;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount {public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map(Object key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);}}}public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result);}} public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length != 2) {System.err.println("Usage: wordcount <in> <out>");System.exit(2);}Job job = new Job(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, new Path(otherArgs[0]));FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);}}

編輯pom.xml，添加依賴庫。通過maven的repository里可以查得（http://mvnrepository.com/artifact/org.apache.hadoop）

<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>3.8.1</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.3.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.3.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>2.3.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.3.0</version> </dependency> </dependencies>

12345678910111213141516171819202122232425262728

<dependencies><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>3.8.1</version><scope>test</scope></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.3.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.3.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-jobclient</artifactId><version>2.3.0</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.3.0</version></dependency></dependencies>

這里需要注意的是，直接運行會包map任務找不到WordCount中的子類，所以要在mvn install之后將自己項目這個包再次引入到mvn項目中來。
mvn install:install-file -DgroupId=com.debugo.hadoopDartifactId=mr -Dpackaging=jar -Dversion=0.1 -Dfile=mr-0.0.1-SNAPSHOT.jar -DgeneratePOM=true
然后添加

<dependency> <groupId>com.debugo.hadoop</groupId> <artifactId>mr</artifactId> <version>0.1</version> </dependency>

12345

<dependency><groupId>com.debugo.hadoop</groupId><artifactId>mr</artifactId><version>0.1</version></dependency>

另外，http://www.cnblogs.com/spork/archive/2010/04/21/1717592.html，也是一個很好的解決方案。
編輯Run Configuration，設置運行參數”/input /output”。
然后創建/input目錄: hdfs dfs -mkdir /input
再使用hdfs dfs -put a.txt /input將一些文本傳到該目錄下。
最后執行這個項目，成功后結果就會輸出到/output dfs目錄中。

[2014-03-13 09:52:20,282] INFO 19952[main] - org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1380) - Counters: 49 File System Counters FILE: Number of bytes read=5263 FILE: Number of bytes written=183603 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=6739 HDFS: Number of bytes written=3827 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=3075 Total time spent by all reduces in occupied slots (ms)=6294 Total time spent by all map tasks (ms)=3075 Total time spent by all reduce tasks (ms)=3147 Total vcore-seconds taken by all map tasks=3075 Total vcore-seconds taken by all reduce tasks=3147 Total megabyte-seconds taken by all map tasks=4723200 Total megabyte-seconds taken by all reduce tasks=9667584 Map-Reduce Framework Map input records=144 Map output records=960 Map output bytes=10358 Map output materialized bytes=5263 Input split bytes=104 Combine input records=960 Combine output records=361 Reduce input groups=361 Reduce shuffle bytes=5263 Reduce input records=361 Reduce output records=361 Spilled Records=722 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=26 CPU time spent (ms)=2290 Physical memory (bytes) snapshot=1309593600 Virtual memory (bytes) snapshot=8647901184 Total committed heap usage (bytes)=2021654528 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=6635 File Output Format Counters Bytes Written=3827

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556

[2014-03-13 09:52:20,282] INFO 19952[main] - org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1380) - Counters: 49File System CountersFILE: Number of bytes read=5263FILE: Number of bytes written=183603FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=6739HDFS: Number of bytes written=3827HDFS: Number of read operations=6HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=1Launched reduce tasks=1Data-local map tasks=1Total time spent by all maps in occupied slots (ms)=3075Total time spent by all reduces in occupied slots (ms)=6294Total time spent by all map tasks (ms)=3075Total time spent by all reduce tasks (ms)=3147Total vcore-seconds taken by all map tasks=3075Total vcore-seconds taken by all reduce tasks=3147Total megabyte-seconds taken by all map tasks=4723200Total megabyte-seconds taken by all reduce tasks=9667584Map-Reduce FrameworkMap input records=144Map output records=960Map output bytes=10358Map output materialized bytes=5263Input split bytes=104Combine input records=960Combine output records=361Reduce input groups=361Reduce shuffle bytes=5263Reduce input records=361Reduce output records=361Spilled Records=722Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=26CPU time spent (ms)=2290Physical memory (bytes) snapshot=1309593600Virtual memory (bytes) snapshot=8647901184Total committed heap usage (bytes)=2021654528Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=6635File Output Format Counters Bytes Written=3827

參考文獻：
使用YUM安裝MySQL 5.5 http://www.linuxidc.com/Linux/2012-07/65098.htm
HDP官方文檔
Canon的maven構建hadoop 1.x版本項目指南 http://blog.fens.me/hadoop-maven-eclipse/

來自為知筆記(Wiz)

總結

以上是生活随笔為你收集整理的使用HDP快速搭建Hadoop开发环境 | Debugo的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：个人便携式服务器之硬件配置
下一篇：保护眼睛的颜色和各种背景颜色设置方法