WordCount单词计数
生活随笔
收集整理的這篇文章主要介紹了
WordCount单词计数
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
課程鏈接:Hadoop大數據平臺架構與實踐--基礎篇
計算文件中出現每個單詞的頻數,輸入結果按照字母順序進行排序
Map過程(切分,中間結果:Key-Value)
Reduce過程(合并、歸約后經過Hash,所有單詞放在同一個結點)
步驟:
?WordCount.java
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;public class WordCount {public static class WordCountMap extendsMapper<LongWritable, Text, Text, IntWritable> {private final IntWritable one = new IntWritable(1);private Text word = new Text();public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();StringTokenizer token = new StringTokenizer(line);while (token.hasMoreTokens()) {word.set(token.nextToken());context.write(word, one);}}}public static class WordCountReduce extendsReducer<Text, IntWritable, Text, IntWritable> {public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}context.write(key, new IntWritable(sum));}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = new Job(conf);job.setJarByClass(WordCount.class);job.setJobName("wordcount");job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);job.setMapperClass(WordCountMap.class);job.setReducerClass(WordCountReduce.class);job.setInputFormatClass(TextInputFormat.class);job.setOutputFormatClass(TextOutputFormat.class);FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));job.waitForCompletion(true);} }?
案例:利用MapReduce進行排序
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.Partitioner; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser;public class Sort {public static class Map extendsMapper<Object, Text, IntWritable, IntWritable> {private static IntWritable data = new IntWritable();public void map(Object key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();data.set(Integer.parseInt(line));context.write(data, new IntWritable(1));}}public static class Reduce extendsReducer<IntWritable, IntWritable, IntWritable, IntWritable> {private static IntWritable linenum = new IntWritable(1);public void reduce(IntWritable key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {for (IntWritable val : values) {context.write(linenum, key);linenum = new IntWritable(linenum.get() + 1);}}}public static class Partition extends Partitioner<IntWritable, IntWritable> {@Overridepublic int getPartition(IntWritable key, IntWritable value,int numPartitions) {int MaxNumber = 65223;int bound = MaxNumber / numPartitions + 1;int keynumber = key.get();for (int i = 0; i < numPartitions; i++) {if (keynumber < bound * i && keynumber >= bound * (i - 1))return i - 1;}return 0;}}/*** @param args*/public static void main(String[] args) throws Exception {// TODO Auto-generated method stubConfiguration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length != 2) {System.err.println("Usage WordCount <int> <out>");System.exit(2);}Job job = new Job(conf, "Sort");job.setJarByClass(Sort.class);job.setMapperClass(Map.class);job.setPartitionerClass(Partition.class);job.setReducerClass(Reduce.class);job.setOutputKeyClass(IntWritable.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, new Path(otherArgs[0]));FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);}}?
轉載于:https://www.cnblogs.com/exciting/p/9211536.html
總結
以上是生活随笔為你收集整理的WordCount单词计数的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: window下tomcat的内存溢出问题
- 下一篇: delphi中DateTimePicke