hadoop loadBalance源码分析
項目hbase數據庫出現很詭異的assignment ,region移動的src和dest都是同一臺regionserver,不過時間戳不同,啟動的只有一個regionserver, 不知道怎么出現了兩個時間戳
分析下源碼解決一下?
loadbalance只有一個實現 org.apache.hadoop.hbase.master.DefaultLoadBalancer 在HMaster中會啟動一個線程?org.apache.hadoop.hbase.Chore,按照設定的hbase.balancer.period(默認300000ms,五分鐘),遍歷所有表,根據每個表在regionserver中的region數量做balance,有一個平衡系數hbase.regions.slop(默認0.2),根據region總數算出平均region值,avg×0.8 取整作為最小值,avg×1.2取整作為最大值,regionserver上超過最大值要移走,小于最小值要移動region過來。否則打印目前的平衡狀態。 assignmentManager 根據上述步驟生成的RegionPlan, 從src移動region到desc ?src和desc都是ServerName對象 HMaster啟動時會等待region servers注冊到serverManager // Wait for region servers to report in. this.serverManager.waitForRegionServers(status); // Check zk for regionservers that are up but didn't register for (ServerName sn: this.regionServerTracker.getOnlineServers()) { if (!this.serverManager.isServerOnline(sn)) { // Not registered; add it.LOG.info("Registering server found up in zk but who has not yet " +"reported in: " + sn);this.serverManager.recordNewServer(sn, HServerLoad.EMPTY_HSERVERLOAD);} }serverManager線程sleep一定時間,等待HRegionServer注冊
HRegionServer.java:
// Try and register with the Master; tell it we are here. Break if// server is stopped or the clusterup flag is down or hdfs went wacky.while (keepLooping()) {MapWritable w = reportForDuty();if (w == null) {LOG.warn("reportForDuty failed; sleeping and then retrying.");this.sleeper.sleep();} else {handleReportForDutyResponse(w);break;}}HRegionServer 注冊之后進入mainloop
// The main run loop.while (!this.stopped && isHealthy()) {long now = System.currentTimeMillis();
if ((now - lastMsg) >= msgInterval) {
doMetrics();
tryRegionServerReport();
lastMsg = System.currentTimeMillis();
}
? }
每隔hbase.regionserver.msginterval時間(默認3秒),進行一次注冊嘗試,如果服務器ip和端口不在已注冊列表中,則添加ServerName進map
ServerManager.java
void regionServerReport(ServerName sn, HServerLoad hsl)throws YouAreDeadException, PleaseHoldException {checkIsDead(sn, "REPORT");if (!this.onlineServers.containsKey(sn)) {// Already have this host+port combo and its just different start code? checkAlreadySameHostPort(sn);// Just let the server in. Presume master joining a running cluster.// recordNewServer is what happens at the end of reportServerStartup.// The only thing we are skipping is passing back to the regionserver// the ServerName to use. Here we presume a master has already done// that so we'll press on with whatever it gave us for ServerName. recordNewServer(sn, hsl);} else {this.onlineServers.put(sn, hsl);}}recordNewServer 會打印 ServerName對象的ip 端口和時間戳信息
同一個region server注冊的ServerName對象 會擁有同樣的時間戳?
this.startcode = System.currentTimeMillis();...result = this.hbaseMaster.regionServerStartup(port, this.startcode, now);...this.serverNameFromMasterPOV = new ServerName(hostnameFromMasterPOV, this.isa.getPort(), this.startcode);...this.hbaseMaster.regionServerReport(this.serverNameFromMasterPOV.getVersionedBytes(), hsl);?
region server啟動時startCode是固定死的,按照這個流程是不會出現相同IP和端口,但時間戳不同的region server跑在線上的?
如果一臺機器上啟動了兩個region server 會把時間戳小的移出,下次添加進時間戳大的進去
我們遇到的問題是時間戳不同的regionserver被注冊在了master上,并且相互之間做region move
?
轉載于:https://www.cnblogs.com/shenguanpu/archive/2012/07/30/2615214.html
總結
以上是生活随笔為你收集整理的hadoop loadBalance源码分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Java私有构造函数不能阻止继承
- 下一篇: Android -- 获取摄像头帧数据解