當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

海量数据找中位数

發布時間：2024/9/30 编程问答 23 豆豆

生活随笔收集整理的這篇文章主要介紹了海量数据找中位数小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

有幾百億的整數，分布的存儲到幾百臺通過網絡連接的計算機上，你能否開發出一個算法和系統，找出這幾百億數據的中值？就是在一組排序好的數據中居于中間的數。顯然，一臺機器是裝不下所有的數據。也盡量少用網絡帶寬。

m1:?

將0-2^32劃分為若干個桶，每個電腦放一個桶，每個桶內有個計數器，記錄有多少數放在桶內

從第一個桶的計數器開始加，一直加到大于n/2

所以中位數肯定在這個桶內

m2:

http://matpalm.com/median/distributing.html

http://matpalm.com/median/erlang_multi.html

利用快排的思想，但是只計數不交換位置，節省移動操作。

in the multi process implementation splits the list of numbers to consider into a sub lists
each sub list can be handled by a seperate erlang process, potentially across different machines
(these lists don't have to be the same size)

single [1,2,3,4,5,6,7,8,9] multi [[1,2,3,4],[5,6],[7,8,9]]

recall there a number of operations required in?distributed?case
includes things like determining total number of elements, determining minimum value, etc
each of these operations can be done against each sub list individually with the results aggregated

eg to determine total number of elements

single -> length( [1,2,3,4,5,6,7,8,9] ) = 9 multi -> sum ( length([1,2,3,4]), length([5,6]), length([7,8,9]) ) = sum([4,2,3]) = 9

eg to determine minimum value

single -> min( [1,2,3,4,5,6,7,8,9] ) = 1 multi -> min( min([1,2,3,4]), min([5,6]), min([7,8,9]) ) = min([1,5,7]) = 1

other changes required

rotation

in the single list case we pick the pivot as the first value.
in the multi list case we pick the pivot as the first value of the first list.
recall in the algorithm that?rotation?is sometimes required.
this is to ensure all potential values for a pivot are explored.
so in the multi list case the rotation needs to operate at two levels; rotate the first list and then rotate the list of lists

before [[1,2,3],[4,5,6],[7,8,9]] after [[4,5,6],[7,8,9],[2,3,1]]

給你1T(10^12)的int64整數，和僅有的1GB(10^9)的內存，如何設計算法和程序來找到它們的中值

大數據處理的題，都是采用分塊的。int64bit=8bytes.所以1G內存只能表示1GB/8B=2^27個數。1.先將1G內存分成M=（2^27-1）個塊，每塊數范圍為range=(2^64)/M; 2.創建一大小為M的數組A。A[0]表示數范圍是0-(range-1)。3.從頭掃描一次，數組A元素++；3.這樣就可以縮小范圍。之后再繼續縮小范圍

總結

以上是生活随笔為你收集整理的海量数据找中位数的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：基础题
下一篇：给定key值，在Binary Searc

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

海量数据找中位数

other changes required

rotation

總結