spark yarn任务的executor 无故 timeout之原因分析
生活随笔
收集整理的這篇文章主要介紹了
spark yarn任务的executor 无故 timeout之原因分析
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
問題:
? ? ? ? ?用 ?spark-submit --master yarn --deploy-mode cluster --driver-memory 2G --num-executors 6 --executor-memory 2G ~~~
提交任務時,最后一個executor 執行時間 超過了 160s 導致 timeout而退出,造成任務重新執行造成用時過長。具體請看下面介紹:
17/01/13 09:13:08 WARN spark.HeartbeatReceiver: Removing executor 5 with no recent heartbeats: 161684 ms exceeds timeout 120000 ms 17/01/13 09:13:08 ERROR cluster.YarnClusterScheduler: Lost executor 5 on slave10: Executor heartbeat timed out after 161684 ms 17/01/13 09:13:08 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, slave10): ExecutorLostFailure (executor 5 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 161684 ms 17/01/13 09:13:08 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch 0) 17/01/13 09:13:08 INFO cluster.YarnClusterSchedulerBackend: Requesting to kill executor(s) 5 17/01/13 09:13:08 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 0.0 (TID 5, slave06, partition 0,RACK_LOCAL, 8029 bytes) 17/01/13 09:13:08 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 5 from BlockManagerMaster. 17/01/13 09:13:08 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(5, slave10, 34439) 17/01/13 09:13:08 INFO storage.BlockManagerMaster: Removed 5 successfully in removeExecutor 17/01/13 09:13:08 INFO scheduler.DAGScheduler: Host added was in lost list earlier: slave10 17/01/13 09:13:08 INFO yarn.ApplicationMaster$AMEndpoint: Driver requested to kill executor(s) 5. 17/01/13 09:13:08 INFO scheduler.TaskSetManager: Finished task 0.1 in stage 0.0 (TID 5) in 367 ms on slave06 (5/5) 17/01/13 09:13:08 INFO scheduler.DAGScheduler: ResultStage 0 (saveAsNewAPIHadoopFile at DataFrameFunctions.scala:55) finished in 162.495 s?
初步估計是 因為最后一步用到的計算多,但是 spark的堆外內存配置低 如下所示
| spark.yarn.executor.memoryOverhead | executorMemory * 0.10, with minimum of 384 |
?
故加大配置,如下:spark-submit --master yarn --deploy-mode cluster --driver-memory 2G --num-executors 6 --executor-memory 2G --conf spark.yarn.executor.memoryOverhead=512 --conf spark.yarn.driver.memoryOverhead=512
經測試上述問題不復存在!
?
轉載于:https://www.cnblogs.com/RichardYD/p/6281745.html
總結
以上是生活随笔為你收集整理的spark yarn任务的executor 无故 timeout之原因分析的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 分享一个多线程实现[冒泡][选择][二分
- 下一篇: 分区索引