flink on yarn模式出现The main method caused an error: Could not deploy Yarn job cluster问题排查+解决
報錯復現:
flink run -m yarn-cluster -p 2 -yjm 700m -ytm 1024m -c WordCount target/bbb-1.0-SNAPSHOT.jar
完整報錯如下:
The program finished with the following exception:org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Could not deploy Yarn job cluster.at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662)at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893)at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966) Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy Yarn job cluster.at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:398)at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:70)at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1733)at org.apache.flink.streaming.api.environment.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:94)at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:63)at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620)at WordCount.main(WordCount.java:47)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)... 11 more Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1591614969089_0002 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1591614969089_0002_000001 exited with exitCode: 1 Failing this attempt.Diagnostics: [2020-06-08 19:18:12.457]Exception from container-launch. Container id: container_1591614969089_0002_01_000001 Exit code: 1[2020-06-08 19:18:12.466]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err :[2020-06-08 19:18:12.467]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err :For more detailed output, check the application tracking page: http://Desktop:8188/applicationhistory/app/application_1591614969089_0002 Then click on links to logs of each attempt. . Failing the application. If log aggregation is enabled on your cluster, use this command to further investigate the issue: yarn logs -applicationId application_1591614969089_0002at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:391)... 22 more 2020-06-08 19:18:12,659 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cancelling deployment from Deployment Failure Hook 2020-06-08 19:18:12,660 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at Desktop/192.168.0.103:8032 2020-06-08 19:18:12,661 INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at Desktop/192.168.0.103:10201 2020-06-08 19:18:12,661 INFO org.apache.flink.yarn.YarnClusterDescriptor - Killing YARN application 2020-06-08 19:18:12,668 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1591614969089_0002 2020-06-08 19:18:12,769 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deleting files in hdfs://Desktop:9000/user/appleyuchi/.flink/application_1591614969089_0002.比較難排查的一個報錯,注意確保HADOOP的日志服務器打開,即確保jps中有:
JobHistoryServer,啟動命令為:
"$HADOOP_HOME/bin/mapred --daemon start historyserver"
打開時間線服務器
yarn timelineserver
進行完上述操作后,yarn界面的各個端口應該都能打開了。
#######################################################################################
然后在yarn界面的log中看到如下報錯:
2020-06-08 19:21:02,071 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Shutting YarnJobClusterEntrypoint down with application status FAILED. Diagnostics org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119) Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8082at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:228)at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)... 9 more . 2020-06-08 19:21:02,076 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:37633 2020-06-08 19:21:02,077 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service. 2020-06-08 19:21:02,082 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service. 2020-06-08 19:21:02,087 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon. 2020-06-08 19:21:02,088 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports. 2020-06-08 19:21:02,095 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon. 2020-06-08 19:21:02,095 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports. 2020-06-08 19:21:02,110 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down. 2020-06-08 19:21:02,110 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down. 2020-06-08 19:21:02,130 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service. 2020-06-08 19:21:02,131 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service. 2020-06-08 19:21:02,132 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Could not start cluster entrypoint YarnJobClusterEntrypoint. org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119) Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)... 2 more Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8082at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:228)at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)... 9 more##############################################################
端口問題,但是這個端口并沒有占用啊,所以我也懵逼了一會兒。
犯錯原因:
這兩個文件中的端口要保持統一,我忘記修改masters文件了,從而導致了上述復雜的報錯。
這里之所以默認的8081要改成8082是因為8081被spark給占用了,所以我當時修改完flink-conf.yaml就忘乎所以了。
?
最終解決方案:
flink-conf.yaml:rest.port: 8082
masters:Desktop:8082
然后別忘記這兩個文件同步更新到集群中的其他節點。
關閉眼前的所有終端,重新開一個終端,因為配置文件只有在你開啟新終端的情況下才會生效。
?
?
?
?
?
總結
以上是生活随笔為你收集整理的flink on yarn模式出现The main method caused an error: Could not deploy Yarn job cluster问题排查+解决的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 煜怎么读音「在线发音」
- 下一篇: 大数据Notebook调研信息汇总(持续