yarn container写token目录选择bug
在nm啟動container的過程中,有一個步驟是把當前的tokens寫入本地目錄,默認情況下具體的調(diào)用的方法是在DefaultContainerExecutor類的startLocalizer?方法中:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ??public?synchronized?void?startLocalizer?(Path?nmPrivateContainerTokensPath, ??????InetSocketAddress?nmAddr,?String?user,?String?appId,?String?locId, ??????List<String>?localDirs,?List<String>?logDirs) ??????throws?IOException,?InterruptedException?{ ????ContainerLocalizer?localizer?= ????????new?ContainerLocalizer(?lfs,?user,?appId,?locId,?getPaths(localDirs), ????????????RecordFactoryProvider.getRecordFactory(getConf())); ????createUserLocalDirs(localDirs,?user);?//Initialize?the?local?directories?for?a?particular?user,create?$local.dir/usercache/$user?and?its?immediate?parent ????createUserCacheDirs(localDirs,?user);?//Initialize?the?local?cache?directories?for?a?particular?user.$local.dir/usercache/$user,$local.dir/usercache/$user/appcache,$local.dir/usercache/$user/filecache ????createAppDirs(localDirs,?user,?appId);?//Initialize?the?local?directories?for?a?particular?user.$local.dir/usercache/$user/appcache/$appi ????createAppLogDirs(appId,?logDirs);?//Create?application?log?directories?on?all?disks.create?$log.dir/$appid ????//?TODO?:?Why?pick?first?app?dir.?The?same?in?LCE?why?not?random? ????Path?appStorageDir?=?getFirstApplicationDir?(localDirs,?user,?appId); ????String?tokenFn?=?String.format(ContainerLocalizer.TOKEN_FILE_NAME_FMT,?locId); ????Path?tokenDst?=?new?Path?(appStorageDir,?tokenFn); ????lfs.util().copy(nmPrivateContainerTokensPath,?tokenDst); ????LOG.info(?"Copying?from?"?+?nmPrivateContainerTokensPath?+?"?to?"?+?tokenDst); ????lfs.setWorkingDirectory(appStorageDir); ????LOG.info(?"CWD?set?to?"?+?appStorageDir?+?"?=?"?+?lfs.getWorkingDirectory()); ????//?TODO?:?DO?it?over?RPC?for?maintaining?similarity? ????localizer.runLocalization(nmAddr); ??} |
主要注意?getFirstApplicationDir?(localDirs, user, appId)這一段,先生成token文件的名稱,然后調(diào)用copy的操作把具體的token文件cp到y(tǒng)arn的本地工作目錄。
這里getFirstApplicationDir?方法,傳入的第一個參數(shù)是yarn寫臨時數(shù)據(jù)的目錄,和
| 1 | yarn.nodemanager.local-dirs(List?of?directories?to?store?localized?files?in.) |
相關(guān)?
| 1 2 3 4 | ??private?Path?getFirstApplicationDir?(List<String>?localDirs,?String?user, ??????String?appId)?{ ????return?getApplicationDir(?new?Path(localDirs.get(0)),?user,?appId); ??} |
而這里使用了localDirs.get(0),再來看下localDirs的生成:
localDirs的獲取定義在ResourceLocalizationService內(nèi)部類LocalizerRunner類的run方法中:
| 1 2 3 4 | ?private?LocalDirsHandlerService?dirsHandler; .... ????????List<String>?localDirs?=?dirsHandler.getLocalDirs(); ????????List<String>?logDirs?=?dirsHandler.getLogDirs(); |
調(diào)用LocalDirsHandlerService?類:
| 1 2 3 4 5 6 7 8 | ??/**?Local?dirs?to?store?localized?files?in?*/ ??private?DirectoryCollection?localDirs?=?null; ??/**?storage?for?container?logs*/ ??private?DirectoryCollection?logDirs?=?null; ??????localDirs?=?new?DirectoryCollection( ??????????validatePaths(conf.getTrimmedStrings(YarnConfiguration.NM_LOCAL_DIRS))); ??????logDirs?=?new?DirectoryCollection( ??????????validatePaths(conf.getTrimmedStrings(YarnConfiguration.NM_LOG_DIRS))); |
這里localDirs?是通過解析yarn.nodemanager.local-dirs配置項的值獲取的,因為配置項是一定的,這就導致得出的localDirs?一直是同一個List,從而導致寫入token的目錄一直是同一個目錄,這其實是一個bug:
https://issues.apache.org/jira/browse/YARN-2566
導致在寫入token文件時,所有的container的token都會寫到同一個目錄,解決的方法其實是使用了隨機數(shù)的方式,具體可以看patch.
本文轉(zhuǎn)自菜菜光 51CTO博客,原文鏈接:http://blog.51cto.com/caiguangguang/1585277,如需轉(zhuǎn)載請自行聯(lián)系原作者
總結(jié)
以上是生活随笔為你收集整理的yarn container写token目录选择bug的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: C# Note32: 查漏补缺
- 下一篇: Spring Cloud构建微服务架构—