hadoop job 数量_Hadoop job任务分配
1. 必要性Hadoop提供了多個配置參數(shù)使得admin和user可以靈活設(shè)定內(nèi)存;有些參數(shù)有defaut-value, 有些選項是cluster specific以支持memory-intensive作業(yè)。
當(dāng)構(gòu)建一個cluster時,admin可以先設(shè)定一些appropriate default value;其他一些參數(shù)設(shè)定可根據(jù)cluster硬件配置(如任務(wù)可獲得的物理內(nèi)存和虛擬內(nèi)存的總大小、slave配置的slots的數(shù)目、在slave上運行的process的需求)和作業(yè)類型(如內(nèi)存密集型任務(wù))而確定。
2. 內(nèi)存監(jiān)控(1) 監(jiān)控任務(wù)內(nèi)存的目的防止MapReduce task占用了過量的內(nèi)存(consuming memory beyond a limit),從而導(dǎo)致同在該slave上運行的其他進程、其他任務(wù)、或者daemon(例如DataNode或者TaskTracker)。(2) virtual memory和physical memoryHadoop可以監(jiān)控節(jié)點的virtual memory和physical memory,兩者之間獨立。然而,在streaming應(yīng)用中,由于程序需要加載了libraries來執(zhí)行任務(wù),故virtual memory使用較多。在這種情況下,監(jiān)控physical memory會更準(zhǔn)確.
(3) hadoop允許為作業(yè)指定期望所需內(nèi)存的最大值。通過resource aware scheduling and monitoring, hadoop tries to確保滿足task數(shù)量,以滿足限制(a) an individual job's memory requirement
(b) the total amount of memory available for all MapReduce tasks(4) TaskTracker 對task的監(jiān)控(a) 周期性的監(jiān)控第一步:以防某個task及其child process累計使用的virtual memory和physical memory的量不超過specified的量。先查virtual memory, 接著physical memory. 若超過,則kill該task及其child process。并標(biāo)記該task為failed.
第二步:檢查某個job的所有running tasks及其child processes累計使用的virtual memory和physical memory的量。若超過limit, 則kill以足夠量的task,直到累計內(nèi)存的使用量低于limit. (若virtual memory超限,則kill掉那些進展最小的tasks;若physical memory超限,則kill掉那些占用physical memory最多的task)。被kill掉的task被標(biāo)記為killed.(5) Resource aware schedulingResource aware scheduling能確保:要調(diào)度task到某個slave上前,先要確保該slave能夠滿足task的memory requirement。
Capacity Scheduling在調(diào)度作業(yè)時,把virtual memory的需求考慮進去。見
(7) cluster相關(guān)的內(nèi)存配置這些配置與JobTracker和TaskTracker相關(guān),任何job不能修改這些參數(shù)。另外,配置參數(shù)在每個slave上相同。
mapreduce.cluster.{map|reduce}memory.mb: These options define the default amount of virtual memory that should be allocated for MapReduce tasks running in the cluster. They typically match the default values set for the options mapreduce.{map|reduce}.memory.mb. They help in the calculation of the total amount of virtual memory available for MapReduce tasks on a slave, using the following equation:
Total virtual memory for all MapReduce tasks = (mapreduce.cluster.mapmemory.mb * mapreduce.tasktracker.map.tasks.maximum) + (mapreduce.cluster.reducememory.mb * mapreduce.tasktracker.reduce.tasks.maximum)
Typically, reduce tasks require more memory than map tasks. Hence a higher value is recommended for mapreduce.cluster.reducememory.mb. The value is specified in MB. To set a value of 2GB for reduce tasks, set mapreduce.cluster.reducememory.mb to 2048.
mapreduce.jobtracker.max{map|reduce}memory.mb: These options define the maximum amount of virtual memory that can be requested by jobs using the parameters mapreduce.{map|reduce}.memory.mb. The system will reject any job that is submitted requesting for more memory than these limits. Typically, the values for these options should be set to satisfy the following constraint:
mapreduce.jobtracker.maxmapmemory.mb = mapreduce.cluster.mapmemory.mb * mapreduce.tasktracker.map.tasks.maximum
mapreduce.jobtracker.maxreducememory.mb = mapreduce.cluster.reducememory.mb * mapreduce.tasktracker.reduce.tasks.maximum
The value is specified in MB. If mapreduce.cluster.reducememory.mb is set to 2GB and there are 2 reduce slots configured in the slaves, the value formapreduce.jobtracker.maxreducememory.mb should be set to 4096.
mapreduce.tasktracker.reserved.physicalmemory.mb: This option defines the amount of physical memory that is marked for system and daemon processes. Using this, the amount of physical memory available for MapReduce tasks is calculated using the following equation:
Total physical memory for all MapReduce tasks = Total physical memory available on the system - mapreduce.tasktracker.reserved.physicalmemory.mb
The value is specified in MB. To set this value to 2GB, specify the value as 2048.
mapreduce.tasktracker.taskmemorymanager.monitoringinterval: This option defines the time the TaskTracker waits between two cycles of memory monitoring. The value is specified in milliseconds.
Note: The virtual memory monitoring function is only enabled if the variables mapreduce.cluster.{map|reduce}memory.mb andmapreduce.jobtracker.max{map|reduce}memory.mb are set to values greater than zero. Likewise, the physical memory monitoring function is only enabled if the variable mapreduce.tasktracker.reserved.physicalmemory.mb is set to a value greater than zero.
轉(zhuǎn)自http://blog.csdn.net/amaowolf/article/details/7188504
總結(jié)
以上是生活随笔為你收集整理的hadoop job 数量_Hadoop job任务分配的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Java里的 for (;;) 与 wh
- 下一篇: java开发中spring常用的工具类