hbase replication原理分析
                                                            生活随笔
收集整理的這篇文章主要介紹了
                                hbase replication原理分析
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.                        
                                
                            
                            
                            本文只是從總體流程來分析replication過程,很多細節沒有提及,下一篇文章準備多分析分析細節。  replicationSource啟動過程 org.apache.hadoop.hbase.regionserver.HRegionServer#startServiceThreads -> org.apache.hadoop.hbase.replication.regionserver.Replication#startReplicationService -> ?//初始化replicationManager org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager#init -> //在init階段for循環把所有的replicationPeers添加到source里,即每個replicationPeer對應一個source,也就是可以添加多個slave cluster,replicationPeers從zookeeper /hbase/replication/peers目錄取 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager#addSource -> //在addSource階段生成ReplicationSource并啟動ReplicationSource,ReplicationSource本身是一個線程 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource#startup //ReplicationSource線程啟動,進入while循環工作    replicationSource大致工作流程 while(isAlive())進行主體循環 從WAL文件獲取List<WAL.Entry> 通過調用shipEdits方法發送數據 調用replicationEndpoint replicate方法發送數據 最終調用admin.replicateWALEntry通過rpc發送數據 ?   regionserver如何從slave cluster中選取regionserver當做復制節點  replication過程需要連接peer(slave cluster),首先要獲取這個peer所有活著的regionservers 拿到所有regionservers信息之后,開始選擇哪些regionservers作為replication的對象 選哪些regionservers當做sink由peer活著的regionserver個數*ratio(默認值0.1)決定,regionservers先shuffle打亂順序后再截取 如果選擇的sink(regionserver)個數為0,一直等待peer上線,也就是slave cluster沒有啟動的情況 下面源碼可以解釋如何選擇regionserver當做sink    private void connectToPeers() {getRegionServers();int sleepMultiplier = 1;// Connect to peer cluster first, unless we have to stopwhile (this.isRunning() && replicationSinkMgr.getSinks().size() == 0) {replicationSinkMgr.chooseSinks();if (this.isRunning() && replicationSinkMgr.getSinks().size() == 0) {if (sleepForRetries("Waiting for peers", sleepMultiplier)) {sleepMultiplier++;     //倍數最多為默認配置的300倍,也就是每次sleep最長間隔是300秒
        }}}}void chooseSinks() {List<ServerName> slaveAddresses = endpoint.getRegionServers();Collections.shuffle(slaveAddresses, random);int numSinks = (int) Math.ceil(slaveAddresses.size() * ratio);sinks = slaveAddresses.subList(0, numSinks);lastUpdateToPeers = System.currentTimeMillis();badReportCounts.clear();}/*** Do the sleeping logic* @param msg Why we sleep* @param sleepMultiplier by how many times the default sleeping time is augmented* @return True if <code>sleepMultiplier</code> is < <code>maxRetriesMultiplier</code>*/protected boolean sleepForRetries(String msg, int sleepMultiplier) {try {if (LOG.isTraceEnabled()) {LOG.trace(msg + ", sleeping " + sleepForRetries + " times " + sleepMultiplier);}Thread.sleep(this.sleepForRetries * sleepMultiplier);} catch (InterruptedException e) {LOG.debug("Interrupted while sleeping between retries");}return sleepMultiplier < maxRetriesMultiplier;}this.maxRetriesMultiplier = this.conf.getInt("replication.source.maxretriesmultiplier", 300);
this.ratio = conf.getFloat("replication.source.ratio", DEFAULT_REPLICATION_SOURCE_RATIO);  每個slave cluster對應一個replicationSource線程,各個slave復制互不干擾 每個replicationSource是單線程進行傳輸數據,改成多線程并發傳可能更好 數據是通過rpc發送過去,調用slave cluster regionserver RSRpcServices的replicateWALEntry方法 
                        
                        
                        ?
??? 總結?
轉載于:https://www.cnblogs.com/yueweimian/p/6520390.html
創作挑戰賽新人創作獎勵來咯,堅持創作打卡瓜分現金大獎總結
以上是生活随笔為你收集整理的hbase replication原理分析的全部內容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: 各种组件的js 获取值 / js动态赋值
- 下一篇: 不错的Mac软件下载网站
