MapReduce示例——WordCount(统计单词)
生活随笔
收集整理的這篇文章主要介紹了
MapReduce示例——WordCount(统计单词)
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
MapReduce示例——WordCount(統(tǒng)計(jì)單詞)
過程分析
統(tǒng)計(jì)單詞,把數(shù)據(jù)中的單詞分別統(tǒng)計(jì)出出現(xiàn)的次數(shù)
過程圖(圖片源自網(wǎng)絡(luò)):
實(shí)現(xiàn)Mapper、Reducer、Driver
WordCountMapper :
public class WordCountMapper extends Mapper<LongWritable,Text,Text,IntWritable> {private Text k = new Text();private IntWritable v = new IntWritable(1);/*** 重寫map方法* @param key 行號* @param value 行數(shù)據(jù)* @param context* @throws IOException* @throws InterruptedException*/@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {// 獲取一行的數(shù)據(jù)String valueString = value.toString();// 分割一行的數(shù)據(jù)String[] strings = valueString.split(" ");// 輸出K-V對for (String string : strings) {k.set(string);context.write(k,v);}} }- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
WordCountReduce :
public class WordCountReduce extends Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable v = new IntWritable(0);/*** reduce合并過程* @param key key值* @param values 同一個(gè)key的value值得列表* @param context* @throws IOException* @throws InterruptedException*/@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {// 統(tǒng)計(jì)數(shù)字int count = 0;// 匯總數(shù)字for (IntWritable value : values) {count += value.get();}// 賦值v.set(count);// 輸出context.write(key,v);} }- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
WordCountDriver
public class WordCountDriver {public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {// 獲取job對象System.setProperty("hadoop.home.dir", "E:\\hadoop-2.7.1");Configuration configuration = new Configuration();FileSystem fs = FileSystem.get(configuration);//configuration.set("mapreduce.framework.name","local");//configuration.set("fs.defaultFS","file:///");Job job = Job.getInstance(configuration);// 設(shè)置加載類job.setJarByClass(WordCountDriver.class);// 設(shè)置map和reduce類job.setMapperClass(WordCountMapper.class);job.setReducerClass(WordCountReduce.class);// 設(shè)置mapper輸出類型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);// 設(shè)置最終輸出類型job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);// 設(shè)置輸入文件和輸出文件FileInputFormat.setInputPaths(job,new Path("E:\\hdfs\\input\\word.txt"));Path outPath = new Path("E:\\hdfs\\output");if (fs.exists(outPath)) {fs.delete(outPath, true);}FileOutputFormat.setOutputPath(job, new Path("E:\\hdfs\\output"));boolean waitForCompletion = job.waitForCompletion(true);System.out.println(waitForCompletion);System.exit(waitForCompletion?0:1);} }- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
pom.xml
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.xing</groupId><artifactId>MapReduce</artifactId><version>1.0-SNAPSHOT</version><dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.7.1</version></dependency></dependencies></project>- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
我用的本地windows開發(fā)環(huán)境,如果不知道怎么搭建本地開發(fā)環(huán)境可以看我這篇
【Spark】Windows運(yùn)行本地spark程序——JAVA版本
搭建過程都是一樣的。
結(jié)果:
總結(jié)
以上是生活随笔為你收集整理的MapReduce示例——WordCount(统计单词)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: MapReduce简述、工作流程
- 下一篇: 简单的MapReduce项目,计算文件中