當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

WordCount单词计数

發布時間：2025/6/17 编程问答 33 豆豆

生活随笔收集整理的這篇文章主要介紹了 WordCount单词计数小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

課程鏈接：Hadoop大數據平臺架構與實踐--基礎篇

計算文件中出現每個單詞的頻數，輸入結果按照字母順序進行排序

Map過程（切分，中間結果：Key-Value）

Reduce過程（合并、歸約后經過Hash,所有單詞放在同一個結點）

步驟：

編寫WordCount.java，包含Mapper類和Reduce類

編譯WordCount.java,javac -classpath

打包jar -cvf WordCount.jar classes/*

作業提交 hadoop jar WordCount.jar WordCount input output

?WordCount.java

import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;public class WordCount {public static class WordCountMap extendsMapper<LongWritable, Text, Text, IntWritable> {private final IntWritable one = new IntWritable(1);private Text word = new Text();public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();StringTokenizer token = new StringTokenizer(line);while (token.hasMoreTokens()) {word.set(token.nextToken());context.write(word, one);}}}public static class WordCountReduce extendsReducer<Text, IntWritable, Text, IntWritable> {public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}context.write(key, new IntWritable(sum));}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = new Job(conf);job.setJarByClass(WordCount.class);job.setJobName("wordcount");job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);job.setMapperClass(WordCountMap.class);job.setReducerClass(WordCountReduce.class);job.setInputFormatClass(TextInputFormat.class);job.setOutputFormatClass(TextOutputFormat.class);FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));job.waitForCompletion(true);} }

案例：利用MapReduce進行排序

import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.Partitioner; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser;public class Sort {public static class Map extendsMapper<Object, Text, IntWritable, IntWritable> {private static IntWritable data = new IntWritable();public void map(Object key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();data.set(Integer.parseInt(line));context.write(data, new IntWritable(1));}}public static class Reduce extendsReducer<IntWritable, IntWritable, IntWritable, IntWritable> {private static IntWritable linenum = new IntWritable(1);public void reduce(IntWritable key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {for (IntWritable val : values) {context.write(linenum, key);linenum = new IntWritable(linenum.get() + 1);}}}public static class Partition extends Partitioner<IntWritable, IntWritable> {@Overridepublic int getPartition(IntWritable key, IntWritable value,int numPartitions) {int MaxNumber = 65223;int bound = MaxNumber / numPartitions + 1;int keynumber = key.get();for (int i = 0; i < numPartitions; i++) {if (keynumber < bound * i && keynumber >= bound * (i - 1))return i - 1;}return 0;}}/*** @param args*/public static void main(String[] args) throws Exception {// TODO Auto-generated method stubConfiguration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length != 2) {System.err.println("Usage WordCount <int> <out>");System.exit(2);}Job job = new Job(conf, "Sort");job.setJarByClass(Sort.class);job.setMapperClass(Map.class);job.setPartitionerClass(Partition.class);job.setReducerClass(Reduce.class);job.setOutputKeyClass(IntWritable.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, new Path(otherArgs[0]));FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);}}

轉載于:https://www.cnblogs.com/exciting/p/9211536.html

總結

以上是生活随笔為你收集整理的WordCount单词计数的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： window下tomcat的内存溢出问题
下一篇： delphi中DateTimePicke

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

WordCount单词计数

案例：利用MapReduce進行排序

總結