生活随笔
收集整理的這篇文章主要介紹了
人民大学云计算编程的网上评估平台--解题报告 1004-1007
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
因?yàn)橐淮螌?道題,文章太長了,為了方便大家閱讀,我分成了兩篇。
接著上一篇文章,我們繼續(xù)mapreduce編程之旅~~
?
1004: 題目
?
Single?Table?Join
描述
輸入文件是一個(gè)包含有子女-父母表的文件。請編寫一個(gè)程序,輸入為此輸入文件,輸出是包含在子女-父母表中的孫子女-祖父母關(guān)系表。
輸入
輸入是包含有子女-父母表的一個(gè)文件
輸出
輸出是包含有孫子女-祖父母關(guān)系的一個(gè)文件,孫子女-祖父母關(guān)系是從子女-父母表中得出的。
樣例輸入
child?parent
Tom?Lucy
Tom?Jack
Jone?Lucy
Jone?Jack
Lucy?Mary
Lucy?Ben
Jack?Alice
Jack?Jesse
Terry?Alice
Terry?Jesse
Philip?Terry
Philip?Alma
Mark?Terry
Mark?Alma
樣例輸出
grandchild??grandparent?
Jone????????Alice?
Jone????????Jesse?
Tom?????????Alice?
Tom?????????Jesse?
Jone????????Mary?
Jone????????Ben?
Tom?????????Mary?
Tom?????????Ben?
Mark????????Jesse?
Mark????????Alice?
Philip??????Jesse?
Philip??????Alice
1004:解題思路
單表的連接,這個(gè)比較有味道~~當(dāng)然有可能是我水平有問題,所以寫的比較復(fù)雜。
首先,我定義了一個(gè)自定義數(shù)據(jù)類型TextPair?關(guān)于自定義數(shù)據(jù)類型我這里也不多說了,大家可以百度一下,或者看看hadoop權(quán)威指南上面都會(huì)講解。
接著:我們從輸入可以看出,孩子和雙親都寫在同一個(gè)文件,而我們要求的是祖孫關(guān)系,所以雙親類也會(huì)出現(xiàn)在孩子列。為了正確區(qū)分,所以我們借助自定義數(shù)據(jù)類型來完成。
我先上代碼,在代碼中我會(huì)詳細(xì)注釋:
[java]?view plaincopy
public?class?MyMapre?{?? public?static??class?wordcountMapper?extends?? Mapper{?? public?void?map(LongWritable?key,?Text?value,?Context?context)throws?IOException,?InterruptedException{?? String?key1?=?"";?? String?value1?=?"";?? StringTokenizer?itr?=?new?StringTokenizer(value.toString());?? ?? if?(itr.hasMoreElements()){?? key1?=?itr.nextToken();?? }?? if?(itr.hasMoreElements()){?? value1?=?itr.nextToken();?? }?? ?? ?? ?? context.write(new?TextPair(key1,?0),?new?TextPair(value1,?1));?? context.write(new?TextPair(value1,?1),?new?TextPair(key1,?0));?? }?? }?? public?static??class?wordcountReduce?extends?? Reducer{?? public?void?reduce(TextPair?key,?Iterablevalues,?Context?context)throws?IOException,?InterruptedException{?? ?? List?child?=?new?ArrayList();?? List?parent?=?new?ArrayList();?? for?(TextPair?str?:?values){?? ?? ?? if?(str.second.get()?==?0){?? child.add(str.first.toString());?? }?? else{?? parent.add(str.first.toString());?? }?? }?? if?(child.size()?!=?0?&&?parent.size()?!=?0){?? ?? for?(int?i?=?0;?i?<?child.size();?i++){?? for?(int?j?=?0;?j?<?parent.size();?j++){?? context.write(new?Text(child.get(i)),?new?Text(parent.get(j)));?? }?? }?? }?? }?? }?? ?? public?static?class?TextPair?implements?WritableComparable?{?? private?Text?first;?? private?IntWritable?second;?? public?TextPair()?{?? set(new?Text(),?new?IntWritable());?? }?? public?TextPair(String?first,?int?second)?{?? set(new?Text(first),?new?IntWritable(second));?? }?? public?TextPair(Text?first,?IntWritable?second)?{?? set(first,?second);?? }?? public?void?set(Text?first,?IntWritable?second)?{?? this.first?=?first;?? this.second?=?second;?? }?? public?Text?getFirst()?{?? return?first;?? }?? public?String?toString()?{?? return?(first.toString());?? }?? public?IntWritable?getSecond()?{?? return?second;?? }?? public?void?write(DataOutput?out)?throws?IOException?{?? first.write(out);?? second.write(out);?? }?? public?void?readFields(DataInput?in)?throws?IOException?{?? first.readFields(in);?? second.readFields(in);?? }?? public?int?compareTo(TextPair?tp)?{?? ?? int?cmp?=?first.compareTo(tp.first);?? return?cmp;?? }?? }?? public?static??void?main(String?args[])throws?Exception{?? Configuration?conf?=?new?Configuration();?? Job?job?=?new?Job(conf,?"SingleJoin");?? job.setJarByClass(MyMapre.class);?? job.setMapOutputKeyClass(TextPair.class);?? job.setMapOutputValueClass(TextPair.class);?? job.setOutputKeyClass(Text.class);?? job.setOutputValueClass(Text.class);?? job.setMapperClass(wordcountMapper.class);?? job.setReducerClass(wordcountReduce.class);?? FileInputFormat.setInputPaths(job,?new?Path(args[0]));?? FileOutputFormat.setOutputPath(job,?new?Path(args[1]));?? job.waitForCompletion(true);?? }?? }??
1005: 題目
Multi-table?Join
描述
輸入有兩個(gè)文件,一個(gè)名為factory的輸入文件包含描述工廠名和其對(duì)應(yīng)地址ID的表,另一個(gè)名為address的輸入文件包含描述地址名和其ID的表格。請編寫一個(gè)程序輸出工廠名和其對(duì)應(yīng)地址的名字。
輸入
輸入有兩個(gè)文件,第一個(gè)描述了工廠名和對(duì)應(yīng)地址的ID,第二個(gè)輸入文件描述了地址名和其ID。
輸出
輸出是一個(gè)包含工廠名和其對(duì)應(yīng)地名的文件。
輸入樣例
input:?
factory:
factoryname?addressID
Beijing?Red?Star?1
Shenzhen?Thunder?3
Guangzhou?Honda?2
Beijing?Rising?1
Guangzhou?Development?Bank?2
Tencent?3
Bank?of?Beijing?1
address:
addressID?addressname
1?Beijing
2?Guangzhou
3?Shenzhen
4?Xian
輸出樣例
output:
factoryname??addressname
Bank?of?Beijing?Beijing
Beijing?Red?Star?Beijing?
Beijing?Rising?Beijing?
Guangzhou?Development?Bank?Guangzhou?
Guangzhou?Honda?Guangzhou
Shenzhen?Thunder?Shenzhen?
Tencent?Shenzhen
1005解題思路:
這題跟1004的思路都差不多,能做出1004,那么1005也就不在話下了。
我們已經(jīng)使用1004的自定義數(shù)據(jù)類型TextPair?,因?yàn)槲覀儚囊粋€(gè)文件中讀入得數(shù)據(jù)分為兩類,所以使用TextPair?對(duì)其進(jìn)行區(qū)分。
還是上代碼吧,我在代碼里詳細(xì)注釋:
[java]?view plaincopy
public?class?MyMapre?{?? public?static??class?wordcountMapper?extends?? Mapper{?? public?void?map(LongWritable?key,?Text?value,?Context?context)throws?IOException,?InterruptedException{?? ?? String?str?=?"";?? String?id?=?"";?? String?value1?=?"";?? ?? StringTokenizer?itr?=?new?StringTokenizer(value.toString());?? while?(itr.hasMoreElements()){?? str?=?itr.nextToken();?? ?? if?(!str.matches("[0-9]")){?? value1?+=?str;???? value1?+=?"?";?? }else{??? id?=?str;???? ?? if?(!value1.isEmpty())?{??? context.write(new?Text(id),?new?TextPair(value1,?1));?? return;?? }??? }?? }?? ?? context.write(new?Text(id),?new?TextPair(value1,?0));?}?? }?? public?static??class?wordcountReduce?extends?? Reducer{?? public?void?reduce(Text?key,?Iterablevalues,?Context?context)throws?IOException,?InterruptedException{?? ?? List?factor?=?new?ArrayList();?? List?address?=?new?ArrayList();?? for?(TextPair?str?:?values){?? ?? if?(str.second.get()?==?1){?? factor.add(str.first.toString());?? }?? else{?? ?? address.add(str.first.toString());?? }?? }?? ?? if?(factor.size()?!=?0?&&?address.size()?!=?0){?? for?(int?i?=?0;?i?<?address.size();?i++){?? for?(int?j?=?0;?j?<?factor.size();?j++){?? context.write(new?Text(factor.get(j)),?new?Text(address.get(i)));?? }?? }?? }?? }?? }?? ?? public?static?class?TextPair?implements?WritableComparable?{?? private?Text?first;?? private?IntWritable?second;?? public?TextPair()?{?? set(new?Text(),?new?IntWritable());?? }?? public?TextPair(String?first,?int?second)?{?? set(new?Text(first),?new?IntWritable(second));?? }?? public?TextPair(Text?first,?IntWritable?second)?{?? set(first,?second);?? }?? public?void?set(Text?first,?IntWritable?second)?{?? this.first?=?first;?? this.second?=?second;?? }?? public?Text?getFirst()?{?? return?first;?? }?? public?String?toString()?{?? return?(first.toString());?? }?? public?IntWritable?getSecond()?{?? return?second;?? }?? public?void?write(DataOutput?out)?throws?IOException?{?? first.write(out);?? second.write(out);?? }?? public?void?readFields(DataInput?in)?throws?IOException?{?? first.readFields(in);?? second.readFields(in);?? }?? public?int?compareTo(TextPair?tp)?{?? int?cmp?=?first.compareTo(tp.first);?? return?cmp;?? }?? }?? public?static??void?main(String?args[])throws?Exception{?? Configuration?conf?=?new?Configuration();?? Job?job?=?new?Job(conf,?"MultiTableJoin");?? job.setJarByClass(MyMapre.class);?? job.setMapOutputKeyClass(Text.class);?? job.setMapOutputValueClass(TextPair.class);?? job.setOutputKeyClass(Text.class);?? job.setOutputValueClass(Text.class);?? job.setMapperClass(wordcountMapper.class);?? job.setReducerClass(wordcountReduce.class);?? FileInputFormat.setInputPaths(job,?new?Path(args[0]));?? FileOutputFormat.setOutputPath(job,?new?Path(args[1]));?? job.waitForCompletion(true);?? }?? }??
1006: 題目
Sum
描述
輸入文件是一組文本文件,每個(gè)輸入文件中都包含很多行,每行都是一個(gè)數(shù)字字符串,代表了一個(gè)特別大的數(shù)字。需要注意的是這個(gè)數(shù)字的低位在字符串的開頭,高位在字符串的結(jié)尾。請編寫一個(gè)程序求包含在輸入文件中的所有數(shù)字的和并輸出。
輸入
輸入有很多文件組成,每個(gè)文件都有很多行,每行都由一個(gè)數(shù)字字符串代表一個(gè)數(shù)字。
輸出
輸出時(shí)一個(gè)文件,這個(gè)文件中第一行的第一個(gè)數(shù)字是行標(biāo),第二個(gè)數(shù)字式輸入文件中所有數(shù)字的和。
輸入樣例
input:?
file1:
1235546665312
112344569882
326434546462
21346546846
file2:
3654354655
3215456463
21235465463
321265465
65465463
32
file3:
31654
654564564
3541231564
351646846
3164646
3163
輸出樣例
output:
1?8685932816082
注意:
1?只有一個(gè)輸出文件;
2?輸出文件的第一行由行標(biāo)"1"和所有數(shù)字的和組成;
3?每個(gè)數(shù)字都是正整數(shù)或者零。每個(gè)數(shù)字都超過50位,所以常用數(shù)據(jù)類型是無法存儲(chǔ)的;
4?數(shù)字的低位在數(shù)字字符串的左側(cè),高位在數(shù)字字符串的右側(cè)。比如樣例輸入第一個(gè)輸入文件的第一行代表的數(shù)字是2135666455321。
1006解題思路:1006主要解決兩個(gè)問題,一:大數(shù)加法。 二:將所有數(shù)據(jù)歸一
第一個(gè)問題是常規(guī)解法,我不多說。 第二,因?yàn)槲覀冏詈笮枰蟪鲆粋€(gè)總結(jié)果,所以就需要將所有的key歸成一個(gè)group。當(dāng)然我們可以自定義group的劃分,這個(gè)可以參考hadoop權(quán)威指南,以后如果有需要,我會(huì)寫出來的。我這里用了一個(gè)簡單解決辦法。(能用簡單的辦法,當(dāng)然用簡單的辦法)
我結(jié)合代碼給大家講解吧:
[java]?view plaincopy
public?class?MyMapre?{?? public?static??class?wordcountMapper?extends?? Mapper{?? public?void?map(LongWritable?key,?Text?value,?Context?context)throws?IOException,?InterruptedException{?? ?? context.write(new?LongWritable(1),?value);?? }?? }?? public?static??class?wordcountReduce?extends?? Reducer{?? String?tem?=?"0";??? public?void?reduce(LongWritable?key,?Iterablevalues,?Context?context)throws?IOException,?InterruptedException{?? for?(Text?str?:?values){?? ?? tem?=?Sum(tem,?str.toString());?? }?? context.write(key,?new?Text(tem));?? }?? }?? ?? public??static?String??Sum(String?a,?String?b){?? String?c?=?"";?? int?a_len?=?a.length();?? int?b_len?=?b.length();?? int?jin?=?0;?? int?a_first;?? int?b_first;?? int?temp;?? while?(a_len??>?0?&&?b_len??>?0){?? a_first?=?Integer.parseInt(a.substring(0,?1));?? b_first?=?Integer.parseInt(b.substring(0,?1));?? a?=?a.substring(1);?? b?=?b.substring(1);?? temp=?a_first?+?b_first?+jin;?? jin?=?temp/?10;?? temp=?temp-?10?*?jin;?? c?+=?temp;?? a_len--;?? b_len--;?? }?? if?(a_len?==?0?&&?b_len?==?0?&&?jin?!=?0)?? c?+=?jin;?? while?(a_len?>?0){?? int?k?=?Integer.parseInt(a.substring(0,?1))?+?jin;?? a?=?a.substring(1);?? c?+=?k;?? a_len--;?? jin?=?0;?? }?? while?(b_len?>?0){?? int?k?=?Integer.parseInt(b.substring(0,?1))?+?jin;?? b?=?b.substring(1);?? c?+=?k;?? b_len?--;?? jin?=?0;?? }?? return?c;?? }??? public?static??void?main(String?args[])throws?Exception{?? Configuration?conf?=?new?Configuration();?? Job?job?=?new?Job(conf,?"Sum");?? job.setJarByClass(MyMapre.class);?? job.setMapOutputKeyClass(LongWritable.class);?? job.setMapOutputValueClass(Text.class);?? job.setOutputKeyClass(LongWritable.class);?? job.setOutputValueClass(Text.class);?? job.setMapperClass(wordcountMapper.class);?? job.setReducerClass(wordcountReduce.class);?? FileInputFormat.setInputPaths(job,?new?Path(args[0]));?? FileOutputFormat.setOutputPath(job,?new?Path(args[1]));?? job.waitForCompletion(true);?? }?? }??
1007: 題目
?
?
WordCount?Plus
描述
WordCount例子輸入文本文件并計(jì)算單詞出現(xiàn)的次數(shù)?,F(xiàn)在有一個(gè)WordCount2.0版本,在這個(gè)版本中你必須處理含有"/.',"{}[]:;"等等字符的輸入文件。在你切詞的時(shí)候,你應(yīng)該把"declare,"?切成?"declare",同樣?"Hello!"應(yīng)該切成"Hello","can't"應(yīng)該切成"can't"。
輸入
輸入是包含很多單詞的文本文件。
出入
輸出是一個(gè)文本文件,這個(gè)文件的每一行包含一個(gè)單詞和這個(gè)單詞在所有輸入文件中出現(xiàn)的次數(shù)。在輸出文件中單詞是按照字典順序排序的。
輸入樣例
input1:
hello?world,?bye?world.
input2:
hello?hadoop,?bye?hadoop!
輸出樣例
bye?2
hadoop?2
hello?2
world?2
1007解題思路:1007主要是對(duì)字符的過濾,這里我可以使用正則表達(dá)式來過濾。沒什么難點(diǎn)~~
我們還是邊看代碼邊說吧:
[java]?view plaincopy
public?class?MyMapre?{?? public?static??class?wordcountMapper?extends?? Mapper{?? private?final?static?IntWritable??one?=?new?IntWritable(1);?? private?String?pattern?=?"[^//w/']";??//定義正則表達(dá)式,過濾除數(shù)字、字母、“'”?外的字符?? public?void?map(LongWritable?key,?Text?value,?Context?context)throws?IOException,?InterruptedException{?? String?line?=?value.toString().toLowerCase();?? ?? line?=?line.replaceAll(pattern,?"?");?? ?? StringTokenizer?itr?=?new?StringTokenizer(line);?? while(itr.hasMoreElements()){?? context.write(new?Text(itr.nextToken()),?one);?? }?? }?? }?? public?static??class?wordcountReduce?extends?? Reducer{?? public?void?reduce(Text?key,?Iterablevalues,?Context?context)throws?IOException,?InterruptedException{?? ?? int?sum?=?0;?? for?(IntWritable?str?:?values){?? sum?+=?str.get();?? }?? context.write(key,?new?IntWritable(sum));?? }?? }?? public?static??void?main(String?args[])throws?Exception{?? Configuration?conf?=?new?Configuration();?? Job?job?=?new?Job(conf,?"Plus");?? job.setJarByClass(MyMapre.class);?? job.setMapOutputKeyClass(Text.class);?? job.setMapOutputValueClass(IntWritable.class);?? job.setOutputKeyClass(Text.class);?? job.setOutputValueClass(IntWritable.class);?? job.setMapperClass(wordcountMapper.class);?? job.setReducerClass(wordcountReduce.class);?? FileInputFormat.setInputPaths(job,?new?Path(args[0]));?? FileOutputFormat.setOutputPath(job,?new?Path(args[1]));?? job.waitForCompletion(true);?? }?? }??
?
終于寫完了,當(dāng)然這里寫的是我的解題思路,如果各位大大有更好的想法,不妨分享出來,大家一起happy。上面的程序都能正確提交。
當(dāng)然我不排除我程序中有考慮不周的地方或錯(cuò)誤的地方(測試數(shù)據(jù)的不全面造成)的,如果各位大大能指出,我將不勝感激~~
我最后再說明下,因?yàn)槌绦蚴俏覐木W(wǎng)站上的提交庫直接取回來的,格式不太好看。對(duì)不住各位了~~
總結(jié)
以上是生活随笔為你收集整理的人民大学云计算编程的网上评估平台--解题报告 1004-1007的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。