hadoop程序MapReduce之SingletonTableJoin
生活随笔
收集整理的這篇文章主要介紹了
hadoop程序MapReduce之SingletonTableJoin
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
需求:單表關聯問題。從文件中孩子和父母的關系挖掘出孫子和爺奶關系
樣板:child-parent.txt?
? ? ? ? ?xiaoming daxiong
? ? ? ? ?daxiong alice
? ? ? ? ?daxiong jack
輸出:xiaoming alice
? ? ? ? xiaoming jack
分析設計:
mapper部分設計:
1、<k1,k1>k1代表:一行數據的編號位置,v1代表:一行數據。
2、左表:<k2,v2>k2代表:parent名字,v2代表:(1,child名字),此處1:代表左表標志。
3、右表:<k3,v3>k3代表:child名字,v3代表:(2,parent名字),此處2:代表右表標志。
reduce部分設計:
4、<k4,v4>k4代表:相同的key,v4代表:list<String>
5、求笛卡爾積<k5,v5>:k5代表:grandChild名字,v5代表:grandParent名字。
?
程序部分:
SingletonTableJoinMapper類
package com.cn.singletonTableJoin;import java.io.IOException; import java.util.StringTokenizer;import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper;public class SingletonTableJoinMapper extends Mapper<Object, Text, Text, Text> {@Overrideprotected void map(Object key, Text value, Mapper<Object, Text, Text, Text>.Context context)throws IOException, InterruptedException {String childName = new String();String parentName = new String();String relationType = new String();String[] values=new String[2]; int i = 0;StringTokenizer itr = new StringTokenizer(value.toString());while(itr.hasMoreElements()){values[i] = itr.nextToken();i++;}if(values[0].compareTo("child") != 0){childName = values[0];parentName = values[1];relationType = "1";context.write(new Text(parentName), new Text(relationType+" "+childName));relationType = "2";context.write(new Text(childName), new Text(relationType+" "+parentName));}} }
?
SingletonTableJoinReduce類:
package com.cn.singletonTableJoin;import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; import java.util.List;import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer;public class SingletonTableJoinReduce extends Reducer<Text, Text, Text, Text> {@Overrideprotected void reduce(Text key, Iterable<Text> values, Reducer<Text, Text, Text, Text>.Context context)throws IOException, InterruptedException {List<String> grandChild = new ArrayList<String>();List<String> grandParent = new ArrayList<String>();Iterator<Text> itr = values.iterator();while(itr.hasNext()){String[] record = itr.next().toString().split(" ");if(0 == record[0].length()){continue;}if("1".equals(record[0])){grandChild.add(record[1]);}else if("2".equals(record[0])){grandParent.add(record[1]);}}if(0 != grandChild.size() && 0 != grandParent.size()){for(String grandchild : grandChild){for(String grandparent : grandParent){context.write(new Text(grandchild), new Text(grandparent));}}}} }
?
SingletonTableJoin類
package com.cn.singletonTableJoin;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser;/*** 單表關聯* @author root**/ public class SingletonTableJoin {public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();if (otherArgs.length != 2) {System.err.println("Usage: SingletonTableJoin ");System.exit(2);}//創建一個jobJob job = new Job(conf, "SingletonTableJoin");job.setJarByClass(SingletonTableJoin.class);//設置文件的輸入輸出路徑FileInputFormat.addInputPath(job, new Path(otherArgs[0]));FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));//設置mapper和reduce處理類job.setMapperClass(SingletonTableJoinMapper.class);job.setReducerClass(SingletonTableJoinReduce.class);//設置輸出key-value數據類型job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);//提交作業并等待它完成System.exit(job.waitForCompletion(true) ? 0 : 1);} }
?
把總結當成一種習慣。
?
轉載于:https://www.cnblogs.com/xubiao/p/5759422.html
總結
以上是生活随笔為你收集整理的hadoop程序MapReduce之SingletonTableJoin的全部內容,希望文章能夠幫你解決所遇到的問題。