spark on yarn 配置及异常解决
轉載請務必注明原創地址為:https://dongkelun.com/2018/04/16/sparkOnYarnConf/
前言
YARN 是在Hadoop 2.0 中引入的集群管理器,它可以讓多種數據處理框架運行在一個共享的資源池上,并且通常安裝在與Hadoop 文件系統(簡稱HDFS)相同的物理節點上。在這樣配置的YARN 集群上運行Spark 是很有意義的,它可以讓Spark 在存儲數據的物理節點上運行,以快速訪問HDFS 中的數據。
1、配置
1.1 配置HADOOP_CONF_DIR
vim /etc/profile 復制代碼export HADOOP_CONF_DIR=/opt/hadoop-2.7.5/etc/hadoop 復制代碼source /etc/profile 復制代碼1.2 命令行啟動
spark-shell --master yarn 復制代碼但是在spark2.x里會報一個錯誤
18/04/16 07:59:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/04/16 07:59:27 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 18/04/16 07:59:54 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)at org.apache.spark.repl.Main$.createSparkSession(Main.scala:101)at $line3.$read$$iw$$iw.<init>(<console>:15)at $line3.$read$$iw.<init>(<console>:42)at $line3.$read.<init>(<console>:44)at $line3.$read$.<init>(<console>:48)at $line3.$read$.<clinit>(<console>)at $line3.$eval$.$print$lzycompute(<console>:7)at $line3.$eval$.$print(<console>:6)at $line3.$eval.$print(<console>)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:497)at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply$mcV$sp(SparkILoop.scala:38)at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)at org.apache.spark.repl.SparkILoop$$anonfun$initializeSpark$1.apply(SparkILoop.scala:37)at scala.tools.nsc.interpreter.IMain.beQuietDuring(IMain.scala:214)at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:37)at org.apache.spark.repl.SparkILoop.loadFiles(SparkILoop.scala:98)at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:920)at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)at org.apache.spark.repl.Main$.doMain(Main.scala:74)at org.apache.spark.repl.Main$.main(Main.scala:54)at org.apache.spark.repl.Main.main(Main.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:497)at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 18/04/16 07:59:54 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered! 18/04/16 07:59:54 WARN MetricsSystem: Stopping a MetricsSystem that is not running org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:85)at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:62)at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2516)at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:918)at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:910)at scala.Option.getOrElse(Option.scala:121)at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:910)at org.apache.spark.repl.Main$.createSparkSession(Main.scala:101)... 47 elided <console>:14: error: not found: value sparkimport spark.implicits._^ <console>:14: error: not found: value sparkimport spark.sql^ Welcome to____ __/ __/__ ___ _____/ /___\ \/ _ \/ _ `/ __/ '_//___/ .__/\_,_/_/ /_/\_\ version 2.2.1/_/Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45) Type in expressions to have them evaluated. Type :help for more information.scala> 復制代碼2、錯誤解決
2.1 添加spark.yarn.jars
首先看到第二條warn
Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 復制代碼聯想到是不是這條warn信息導致的,然后根據這條warn信息上網查了一下,再根據錯誤信息也查了一下
Yarn application has already ended! It might have been killed or unable to ... 復制代碼發現,都是說要配置spark.yarn.jars,于是按照如下命令配置
hdfs dfs -mkdir /hadoop hdfs dfs -mkdir /hadoop/spark_jars hdfs dfs -put /opt/spark-2.2.1-bin-hadoop2.7/jars/* /hadoop/spark_jars cd /opt/spark-2.2.1-bin-hadoop2.7/conf/ cp spark-defaults.conf.template spark-defaults.conf vim spark-defaults.conf 復制代碼在最下面添加:
spark.yarn.jars hdfs://192.168.44.128:8888/hadoop/spark_jars/* 復制代碼(注意后面的*不能去掉) 然后啟動spark-shell,發現還是報相似錯誤(沒了warn)
2.2 配置hadoop的yarn-site.xml
因為java8導致的問題
vim /opt/hadoop-2.7.5/etc/hadoop/yarn-site.xml 復制代碼添加:
<property><name>yarn.nodemanager.pmem-check-enabled</name><value>false</value> </property><property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value> </property> 復制代碼再次啟動spark-shell,成功!
3、意外之喜
由于要寫博客記錄,所以需要將錯誤還原,第一次只將spark.yarn.jars注釋掉,啟動spark-shell,發現是成功的,只是會有條warn而已,也就是說,這個錯誤的根本原因,是java8導致沒有配置2.2中的yarn-site.xml!!
參考資料
blog.csdn.net/lxhandlbb/a… blog.csdn.net/gg584741/ar…
轉載于:https://juejin.im/post/5b0368d1518825429c5988fd
總結
以上是生活随笔為你收集整理的spark on yarn 配置及异常解决的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 基于 Egg.js 框架的 Node.j
- 下一篇: expect 普通用户自动输入密码到ro