hadoop程序开发--- Java
生活随笔
收集整理的這篇文章主要介紹了
hadoop程序开发--- Java
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
1、創建maven項目
如果不懂配置maven請點擊:傳送門
2、在pom.xml寫入架包配置文件
<dependencies><!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-client-common --><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-common</artifactId><version>2.8.4</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.8.4</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.8.4</version></dependency></dependencies>3、創建源程序
src–>main–>java–>com–>test–>WordCount.java
WordCount.java
/ ***通過一項授權給Apache Software Foundation(ASF)*或更多貢獻者許可協議。 查看公告文件*隨本作品分發以獲取更多信息*關于版權擁有權。 ASF許可此文件*根據Apache許可2.0版(以下簡稱“* “執照”); 除非合規,否則您不得使用此文件*帶許可證。 您可以在以下位置獲得許可的副本:** http://www.apache.org/licenses/LICENSE-2.0**除非適用法律要求或書面同意,否則軟件*根據許可協議分發的內容是按“原樣”分發的,*不作任何明示或暗示的保證或條件。*有關特定語言的管理權限,請參閱許可證*許可中的限制。* /package com.xxx;import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner;/*** 這是一個示例Hadoop Map / Reduce應用程序。* 讀取文本輸入文件,將每一行分解為單詞* 并計數。 輸出是單詞的本地排序列表,并且* 計算它們發生的頻率。** 運行:bin / hadoop jar build / hadoop-examples.jar wordcount* [-m <i>地圖</ i>] [-r <i>減少</ i>] <i>目錄內</ i> <i>目錄外</ i>*/ public class WordCount extends Configured implements Tool {/*** 計算每一行中的單詞。* 于輸入的每一行,將其分解為單詞并將其作為* <b>單詞</ b>,<b> 1 </ b>)。*/public static class MapClass extends MapReduceBaseimplements Mapper<LongWritable, Text, Text, IntWritable> {private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map(LongWritable key, Text value,OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException {String line = value.toString();StringTokenizer itr = new StringTokenizer(line);while (itr.hasMoreTokens()) {word.set(itr.nextToken());output.collect(word, one);}}}/*** 一個reducer類,該類僅發出輸入值的總和。*/public static class Reduce extends MapReduceBaseimplements Reducer<Text, IntWritable, Text, IntWritable> {public void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text, IntWritable> output,Reporter reporter) throws IOException {int sum = 0;while (values.hasNext()) {sum += values.next().get();}output.collect(key, new IntWritable(sum));}}static int printUsage() {System.out.println("wordcount [-m <maps>] [-r <reduces>] <input> <output>");ToolRunner.printGenericCommandUsage(System.out);return -1;}/*** 字數映射/減少程序的主要驅動程序。* 調用此方法以提交地圖/縮小作業。* @throws When there is communication problems with the job tracker.*/public int run(String[] args) throws Exception {JobConf conf = new JobConf(getConf(), WordCount.class);conf.setJobName("wordcount");// the keys are words (strings)conf.setOutputKeyClass(Text.class);// the values are counts (ints)conf.setOutputValueClass(IntWritable.class);conf.setMapperClass(MapClass.class);conf.setCombinerClass(Reduce.class);conf.setReducerClass(Reduce.class);List<String> other_args = new ArrayList<String>();for(int i=0; i < args.length; ++i) {try {if ("-m".equals(args[i])) {conf.setNumMapTasks(Integer.parseInt(args[++i]));} else if ("-r".equals(args[i])) {conf.setNumReduceTasks(Integer.parseInt(args[++i]));} else {other_args.add(args[i]);}} catch (NumberFormatException except) {System.out.println("ERROR: Integer expected instead of " + args[i]);return printUsage();} catch (ArrayIndexOutOfBoundsException except) {System.out.println("ERROR: Required parameter missing from " +args[i-1]);return printUsage();}}// Make sure there are exactly 2 parameters left.if (other_args.size() != 2) {System.out.println("ERROR: Wrong number of parameters: " +other_args.size() + " instead of 2.");return printUsage();}FileInputFormat.setInputPaths(conf, other_args.get(0));FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));JobClient.runJob(conf);return 0;}public static void main(String[] args) throws Exception {int res = ToolRunner.run(new Configuration(), new WordCount(), args);System.exit(res);} }4、將WordCount.java 打包為jar文件
(1)基本配置
選擇完后 Apply–>ok
(2)開始打包
Build–>Build Artifacts–> XXX.jar–> Build
(3)查看生成的jar文件
在文件夾 out–>artifacts–>WordCount_jar里面
5、運行
我這里將WordCount.jar 上傳到 /usr/local/hadoop-jar 目錄下了
運行命令
重要:程序名前一定要寫 包名 這里是 com.test
6、結束
示例結束,如果想開發其他程序,可以自己另外編寫java 文件,打包上傳運行即可。
如有轉載請標明出處,支持原創。
QQ交流學習群:779133600
總結
以上是生活随笔為你收集整理的hadoop程序开发--- Java的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java 操作txt文件
- 下一篇: hadoop程序开发 --- pytho