當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

随机数文件，上传到hdfs的特定目录/logs下，用mr求和

發(fā)布時間：2024/2/28 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了随机数文件，上传到hdfs的特定目录/logs下，用mr求和小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

隨機數文件，上傳到hdfs的特定目錄/logs下，用mr求和

隨機數文件：

pom.xml文件?

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.henu</groupId><artifactId>henu</artifactId><version>1.0-SNAPSHOT</version><name>henu</name><url>http://www.example.com</url><properties><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><maven.compiler.source>1.8</maven.compiler.source><maven.compiler.target>1.8</maven.compiler.target></properties><dependencies><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>RELEASE</version></dependency><dependency><groupId>log4j</groupId><artifactId>log4j</artifactId><version>1.2.17</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.7.2</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>2.7.2</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.7.2</version></dependency></dependencies><build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-compiler-plugin</artifactId><configuration><source>1.8</source><target>1.8</target><encoding>utf-8</encoding></configuration></plugin></plugins></build> </project>

代碼：

package com.henu;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import java.io.IOException;/*** @author George* @description 隨機數文件, 用mr求和** 思路：1、將map階段的key設置為一個值，然后將讀取的值設置為value* 2、reduce階段通過key進行value的總數計算。**/ public class Sum {public static class SumMapper extends Mapper<LongWritable, Text, Text, IntWritable> {Text k1 = new Text();IntWritable v1 = new IntWritable(1);@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString();String[] strings = line.split("\\s+");for (String s : strings) {k1.set(1 + "");v1.set(Integer.valueOf(s));context.write(k1, v1);}}}public static class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {int result;IntWritable v2 = new IntWritable();@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {result = 0;for (IntWritable value : values) {result += value.get();}v2.set(result);context.write(key, v2);}}public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {Configuration conf = new Configuration();Job job = Job.getInstance(conf);job.setJarByClass(Sum.class);job.setMapperClass(SumMapper.class);job.setReducerClass(SumReducer.class);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.setInputPaths(job, new Path("/logs/log_20191027212349"));FileOutputFormat.setOutputPath(job, new Path("/result"));job.waitForCompletion(true);}}

將其打包，過程就不說了前面博客中有：https://blog.csdn.net/qq_41946557/article/details/102785927

然后將其發(fā)送到linux上

[root@henu1 ~]# yarn jar henu-1.0-SNAPSHOT.jar com.henu.Sum

運行：

查看結果：

[root@henu1 ~]# hdfs dfs -cat /result/part-r-00000

over，歡迎指正，其實我是有個小問題的。(⊙o⊙)…

超強干貨來襲云風專訪：近40年碼齡，通宵達旦的技術人生

總結

以上是生活随笔為你收集整理的随机数文件，上传到hdfs的特定目录/logs下，用mr求和的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：安装Spark集群
下一篇：当分区数量与reducer task数量