yarn关于app max attempt深度解析,针对长服务appmaster平滑重启
在YARN上開發長服務,需要注意fault-tolerance,本篇文章對appmaster的平滑重啟的一個參數做了解析,如何設置可以有助于達到appmaster平滑重啟。
在yarn-site.xml有個參數
/** * The maximum number of application attempts. * It's a global setting for all application masters. */ yarn.resourcemanager.am.max-attempts
一個全局的appmaster重試次數的限制,yarn提交應用時,還可以為單獨一個應用設置最大重試次數
/** * Set the number of max attempts of the application to be submitted. WARNING: * it should be no larger than the global number of max attempts in the Yarn * configuration. * @param maxAppAttempts the number of max attempts of the application * to be submitted. */ @Public @Stable public abstract void setMaxAppAttempts(int maxAppAttempts);
當attempt失敗時,如果設置keepContainersAcrossAppAttempts了,resource manager會決定上個attempt的container是否仍然保留著。
boolean keepContainersAcrossAppAttempts = false;
switch (finalAttemptState) {
case FINISHED:
{
appEvent = new RMAppFinishedAttemptEvent(applicationId,
appAttempt.getDiagnostics());
}
break;
case KILLED:
{
// don't leave the tracking URL pointing to a non-existent AM
appAttempt.setTrackingUrlToRMAppPage();
appAttempt.invalidateAMHostAndPort();
appEvent =
new RMAppFailedAttemptEvent(applicationId,
RMAppEventType.ATTEMPT_KILLED,
"Application killed by user.", false);
}
break;
case FAILED:
{
// don't leave the tracking URL pointing to a non-existent AM
appAttempt.setTrackingUrlToRMAppPage();
appAttempt.invalidateAMHostAndPort();
if (appAttempt.submissionContext
.getKeepContainersAcrossApplicationAttempts()
&& !appAttempt.submissionContext.getUnmanagedAM()) {
// See if we should retain containers for non-unmanaged applications
if (!appAttempt.shouldCountTowardsMaxAttemptRetry()) {
// Premption, hardware failures, NM resync doesn't count towards
// app-failures and so we should retain containers.
keepContainersAcrossAppAttempts = true;
} else if (!appAttempt.maybeLastAttempt) {
// Not preemption, hardware failures or NM resync.
// Not last-attempt too - keep containers.
keepContainersAcrossAppAttempts = true;
}
}
appEvent =
new RMAppFailedAttemptEvent(applicationId,
RMAppEventType.ATTEMPT_FAILED, appAttempt.getDiagnostics(),
keepContainersAcrossAppAttempts);
}
}
關注appAttempt.maybeLastAttempt這個變量,rs如何判斷是否這次attempt是最后一次呢?
private void createNewAttempt() {
ApplicationAttemptId appAttemptId =
ApplicationAttemptId.newInstance(applicationId, attempts.size() + 1);
RMAppAttempt attempt =
new RMAppAttemptImpl(appAttemptId, rmContext, scheduler, masterService,
submissionContext, conf,
// The newly created attempt maybe last attempt if (number of
// previously failed attempts(which should not include Preempted,
// hardware error and NM resync) + 1) equal to the max-attempt
// limit.
maxAppAttempts == (getNumFailedAppAttempts() + 1), amReq);
attempts.put(appAttemptId, attempt);
currentAttempt = attempt;
}
在每次構造新的attempt時候,maxAppAttempts == (getNumFailedAppAttempts() + 1)會決定,已經失敗的次數+1,是否已經達到了maxAppAttempts的限制了。
而maxAppAttempts這個參數是由global和individual兩個配置取min,決定的。
int globalMaxAppAttempts = conf.getInt(YarnConfiguration.RM_AM_MAX_ATTEMPTS,
YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS);
int individualMaxAppAttempts = submissionContext.getMaxAppAttempts();
if (individualMaxAppAttempts <= 0 ||
individualMaxAppAttempts > globalMaxAppAttempts) {
this.maxAppAttempts = globalMaxAppAttempts;
LOG.warn("The specific max attempts: " + individualMaxAppAttempts
+ " for application: " + applicationId.getId()
+ " is invalid, because it is out of the range [1, "
+ globalMaxAppAttempts + "]. Use the global max attempts instead.");
} else {
this.maxAppAttempts = individualMaxAppAttempts;
}
總結:
如果希望appmaster可以達到不斷重啟,而且可以接管之前的container,需要把yarn.resourcemanager.am.max-attempts這個參數盡量調大,比如設置為10000,并且提交app時候設置submit context的最大次數,以及刷新窗口,這樣基本就可以滿足長服務應用在yarn上面的運行需求了。
總結
以上是生活随笔為你收集整理的yarn关于app max attempt深度解析,针对长服务appmaster平滑重启的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: C++ new和delete
- 下一篇: hp 导出日志 远程管理卡_惠普服务器远