當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop大数据——mapreduce的排序机制之total排序

發布時間：2025/1/21 编程问答 15 豆豆

生活随笔收集整理的這篇文章主要介紹了 Hadoop大数据——mapreduce的排序机制之total排序小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

mapreduce的排序機制之total排序

（1）設置一個reduce task ，全局有序，但是并發度太低，單節點負載太大
（2）設置分區段partitioner，設置相應數量的reduce task，可以實現全局有序，但難以避免數據分布不均勻——數據傾斜問題，有些reduce task負載過大，而有些則過小；
（3）可以通過編寫一個job來統計數據分布規律，獲取合適的區段劃分，然后用分區段partitioner來實現排序；但是這樣需要另外編寫一個job對整個數據集運算，比較費事
（4）利用hadoop自帶的取樣器，來對數據集取樣并劃分區段，然后利用hadoop自帶的TotalOrderPartitioner分區來實現全局排序

/*** 全排序示例* @author zhangxueliang**/ public class TotalSort {static class TotalSortMapper extends Mapper<Text, Text, Text, Text> {OrderBean bean = new OrderBean();@Overrideprotected void map(Text key, Text value, Context context) throws IOException, InterruptedException {// String line = value.toString();// String[] fields = line.split("\t");// bean.set(fields[0], Double.parseDouble(fields[1]));context.write(key, value);}}static class TotalSortReducer extends Reducer<Text, Text, Text, Text> {@Overrideprotected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {for (Text v : values) {context.write(key, v);}}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = Job.getInstance(conf);job.setJarByClass(TotalSort.class);job.setMapperClass(TotalSortMapper.class);job.setReducerClass(TotalSortReducer.class); // job.setOutputKeyClass(OrderBean.class); // job.setOutputValueClass(NullWritable.class);//用來讀取sequence源文件的輸入組件job.setInputFormatClass(SequenceFileInputFormat.class);FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));// job.setPartitionerClass(RangePartitioner.class);//分區的邏輯使用的hadoop自帶的全局排序分區組件job.setPartitionerClass(TotalOrderPartitioner.class);//系統自帶的這個抽樣器只能針對sequencefile抽樣RandomSampler randomSampler= new InputSampler.RandomSampler<Text,Text>(0.1,100,10);InputSampler.writePartitionFile(job, randomSampler);//獲取抽樣器所產生的分區規劃描述文件Configuration conf2 = job.getConfiguration();String partitionFile = TotalOrderPartitioner.getPartitionFile(conf2);//把分區描述規劃文件分發到每一個task節點的本地job.addCacheFile(new URI(partitionFile));//設置若干并發的reduce taskjob.setNumReduceTasks(3);job.waitForCompletion(true);} }

總結

以上是生活随笔為你收集整理的Hadoop大数据——mapreduce的排序机制之total排序的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Hadoop大数据——mapreduce
下一篇： Hadoop大数据——mapreduce

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

Hadoop大数据——mapreduce的排序机制之total排序

總結