[MapReduce] Google三驾马车:GFS、MapReduce和Bigtable
聲明:此文轉(zhuǎn)載自博客開(kāi)發(fā)團(tuán)隊(duì)的博客,尊重原創(chuàng)工作。該文適合學(xué)分布式系統(tǒng)之前,作為背景介紹來(lái)讀。
談到分布式系統(tǒng),就不得不提Google的三駕馬車:Google FS[1],MapReduce[2],Bigtable[3]。
雖然Google沒(méi)有公布這三個(gè)產(chǎn)品的源碼,但是他發(fā)布了這三個(gè)產(chǎn)品的詳細(xì)設(shè)計(jì)論文。而且,Yahoo資助的Hadoop也有按照這三篇論文的開(kāi)源Java實(shí)現(xiàn):Hadoop對(duì)應(yīng)MapReduce, Hadoop Distributed File System (HDFS)對(duì)應(yīng)Google FS,Hbase對(duì)應(yīng)Bigtable。不過(guò)在性能上Hadoop比Google要差很多,參見(jiàn)表1。
表1:Hbase和BigTable性能比較(來(lái)源于http://wiki.apache.org/lucene-hadoop/Hbase/PerformanceEvaluation)
| Experiment | HBase20070916 | BigTable |
| random reads | 272 | 1212 |
| random reads (mem) | Not implemented | 10811 |
| random writes | 1460 | 8850 |
| sequential reads | 267 | 4425 |
| sequential writes | 1278 | 8547 |
| Scans | 3692 | 15385 |
以下分別介紹這三個(gè)產(chǎn)品:
1. Google FS
GFS是一個(gè)可擴(kuò)展的分布式文件系統(tǒng),用于大型的、分布式的、對(duì)大量數(shù)據(jù)進(jìn)行訪問(wèn)的應(yīng)用。它運(yùn)行于廉價(jià)的普通硬件上,提供容錯(cuò)功能。
圖1 GFS Architecture
(1)GFS的結(jié)構(gòu)
1. GFS的結(jié)構(gòu)圖見(jiàn)圖1,由一個(gè)master和大量的chunkserver構(gòu)成,
2. 不像Amazon Dynamo的沒(méi)有主的設(shè)計(jì),Google設(shè)置一個(gè)主來(lái)保存目錄和索引信息,這是為了簡(jiǎn)化系統(tǒng)結(jié)果,提高性能來(lái)考慮的,但是這就會(huì)造成主成為單點(diǎn)故障或者瓶頸。為了消除主的單點(diǎn)故障Google把每個(gè)chunk設(shè)置的很大(64M),這樣,由于代碼訪問(wèn)數(shù)據(jù)的本地性,application端和master的交互會(huì)減少,而主要數(shù)據(jù)流量都是Application和chunkserver之間的訪問(wèn)。
3. 另外,master所有信息都存儲(chǔ)在內(nèi)存里,啟動(dòng)時(shí)信息從chunkserver中獲取。提高了master的性能和吞吐量,也有利于master當(dāng)?shù)艉?#xff0c;很容易把后備j機(jī)器切換成master。
4. 客戶端和chunkserver都不對(duì)文件數(shù)據(jù)單獨(dú)做緩存,只是用linux文件系統(tǒng)自己的緩存
“The master stores three major types of metadata: the file and chunk namespaces, the mapping from files to chunks, and the locations of each chunk’s replicas.”
“Having a single master vastly simplifies our design and enables the master to make sophisticated chunk placement and replication decisions using global knowledge. However,we must minimize its involvement in reads and writes so that it does not become a bottleneck. Clients never read and write file data through the master. Instead, a client asks the master which chunkservers it should contact. It caches this information for a limited time and interacts with the chunkservers directly for many subsequent operations.”
“Neither the client nor the chunkserver caches file data.Client caches offer little benefit because most applications stream through huge files or have working sets too large to be cached. Not having them simplifies the client and the overall system by eliminating cache coherence issues.(Clients do cache metadata, however.) Chunkservers need not cache file data because chunks are stored as local files and so Linux’s buffer cache already keeps frequently accessed data in memory.”
(2)GFS的復(fù)制
GFS典型的復(fù)制到3臺(tái)機(jī)器上,參看圖2
圖2 一次寫(xiě)操作的控制流和數(shù)據(jù)流
(3) 對(duì)外的接口
和文件系統(tǒng)類似,GFS對(duì)外提供create, delete,open, close, read, 和 write 操作。另外,GFS還新增了兩個(gè)接口snapshot and record append,snapshot。有關(guān)snapshot的解釋:
“Moreover, GFS has snapshot and record append operations.?Snapshot creates a copy of a file or a directory tree at low cost.
Record append allows multiple clients to append data to the same file concurrently while guaranteeing the atomicity of each individual client’s append.”
2. MapReduce
MapReduce是針對(duì)分布式并行計(jì)算的一套編程模型。
講到并行計(jì)算,就不能不談到微軟的Herb Sutter在2005年發(fā)表的文章” The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software”[6],主要意思是通過(guò)提高cpu主頻的方式來(lái)提高程序的性能很快就要過(guò)去了,cpu的設(shè)計(jì)方向也主要是多核,超線程等并發(fā)上。但是以前的程序并不能自動(dòng)的得到多核的好處,只有編寫(xiě)并發(fā)程序,才能真正獲得多核的好處。分布式計(jì)算也是一樣。
圖3 MapReduce Execution overview
1)MapReduce是由Map和reduce組成,來(lái)自于Lisp,Map是影射,把指令分發(fā)到多個(gè)worker上去,Reduce是規(guī)約,把Map的worker計(jì)算出來(lái)的結(jié)果合并。(參見(jiàn)圖3)
2)Google的MapReduce實(shí)現(xiàn)使用GFS存儲(chǔ)數(shù)據(jù)。
3)MapReduce可用于Distributed Grep,Count of URL Access Frequency,ReverseWeb-Link Graph,Distributed Sort,Inverted Index
3. Bigtable
就像文件系統(tǒng)需要數(shù)據(jù)庫(kù)來(lái)存儲(chǔ)結(jié)構(gòu)化數(shù)據(jù)一樣,GFS也需要Bigtable來(lái)存儲(chǔ)結(jié)構(gòu)化數(shù)據(jù)。
1)BigTable 是建立在 GFS ,Scheduler ,Lock Service 和 MapReduce 之上的。
2)每個(gè)Table都是一個(gè)多維的稀疏圖
3)為了管理巨大的Table,把Table根據(jù)行分割,這些分割后的數(shù)據(jù)統(tǒng)稱為:Tablets。每個(gè)Tablets大概有 100-200 MB,每個(gè)機(jī)器存儲(chǔ)100個(gè)左右的 Tablets。底層的架構(gòu)是:GFS。由于GFS是一種分布式的文件系統(tǒng),采用Tablets的機(jī)制后,可以獲得很好的負(fù)載均衡。比如:可以把經(jīng)常響應(yīng)的表移動(dòng)到其他空閑機(jī)器上,然后快速重建。
參考文獻(xiàn)
[1]The Google File System;?http://labs.google.com/papers/gfs-sosp2003.pdf
[2]MapReduce: Simplifed Data Processing on Large Clusters;?http://labs.google.com/papers/mapreduce-osdi04.pdf
[3]Bigtable: A Distributed Storage System for Structured Data;http://labs.google.com/papers/bigtable-osdi06.pdf
[4]Hadoop ;?http://lucene.apache.org/hadoop/
[5]Hbase: Bigtable-like structured storage for Hadoop HDFS;http://wiki.apache.org/lucene-hadoop/Hbase
[6]The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software;http://www.gotw.ca/publications/concurrency-ddj.htm
轉(zhuǎn)載于:https://www.cnblogs.com/maybe2030/p/4568541.html
總結(jié)
以上是生活随笔為你收集整理的[MapReduce] Google三驾马车:GFS、MapReduce和Bigtable的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 监控摄像机怎么安装?视频监控头安装步骤图
- 下一篇: 哪里可以看到AiMesh 路由器与节点间