spark-on-yarn安装centos
文章目錄
- 1. 簡述
- 2. 安裝過程
- 1. 下載對應版本
- 2. 對spark設置
- 1. 當前系統(tǒng)的環(huán)境
- 2. 新增spark設置
- 3. 設置spark-env.sh
 
- 3. 使用spark-shell進行測試
- 4. 解決問題
- 5. 再次使用spark-shell
- 6. 提交一個spark自帶的計算任務
 
- 3. 小結(jié)
- 4. 錯誤排查的詳細
- 1. 方案一,修改yarn的配置
- 2. 方案二,修改application-master,executor的physical memory設置
 
 
1. 簡述
首先強調(diào)一下,本博客講述的是spark-on-yarn的安裝,不是spark-standalone的安裝方式。
 其實spark-on-yarn 在任何一個可以作為hadoop client的節(jié)點安裝配置spark即可,因為spark是運行在yarn當中的,所以只需要一個類似client一樣的東西,將spark的依賴,用戶任務等提交給yarn即可。
 從網(wǎng)上看的很多spark-on-yarn的安裝方式都要再把spark安裝在多個節(jié)點稱為master-slave模式,然后再往yarn上提交,實際上根本不需要這么麻煩,下面我們也帶著大家一起來看一下spark-on-yarn的具體安裝模式。
2. 安裝過程
1. 下載對應版本
可以在這里找到想要的版本,然后下載
wget https://archive.apache.org/dist/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.6.tgztar -xzf spark-2.3.0-bin-hadoop2.6.tgz -C /usr/local/ cd /usr/local/ mv spark-2.3.0-bin-hadoop2.6/ spark2. 對spark設置
1. 當前系統(tǒng)的環(huán)境
在安裝spark之前,當前的環(huán)境設置主要有這些
cat /etc/profileexport HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin也就是只有hadoop的相關設置。
2. 新增spark設置
在/etc/profile中新增spark相關配置
export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin立即使之生效
source /etc/profile3. 設置spark-env.sh
cp conf/spark-env.sh.template conf/spark-env.sh在里面添加export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop如果這里你想使用自己安裝的scala的話可以進一步設置,如果沒有設置的話,spark會使用自帶的scala
export SCALA_HOME=/usr/share/scala3. 使用spark-shell進行測試
經(jīng)過上面的簡單安裝spark-on-yarn模式就算是安裝成功了,需要注意的是當前機器上面一定要有hadoop的client才行,spark會讀取里面的配置,在提交spark任務的時候往yarn上進行提交。
 下面測試一下,可以使用spark-shell進行測試
可以看到,在啟動spark-shell的時候報錯了,而且錯誤信息很模糊,就是channel close的錯誤
 關于這個錯誤的排查在后面再進行詳述,以免看暈了,下面直接給出解決方式。
4. 解決問題
這個問題產(chǎn)生的原因是因為yarn的有些設置導致了spark任務被kill了
進入到hadoop-master所在機器執(zhí)行
cd /usr/local/hadoop/sbin/ ./stop-yarn.sh修改 /usr/local/hadoop/etc/hadoop/yarn-site.xml
 添加
注意,這個配置的修改需要在hadoop集群的所有節(jié)點進行設置。
然后重啟yarn
cd /usr/local/hadoop/sbin/./start-yarn.sh5. 再次使用spark-shell
[root@dev-03 spark]# spark-shell --master yarn --deploy-mode client 2020-08-10 17:29:12 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 2020-08-10 17:29:16 WARN Utils:66 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041. 2020-08-10 17:29:16 WARN Utils:66 - Service 'SparkUI' could not bind on port 4041. Attempting port 4042. 2020-08-10 17:29:16 WARN Utils:66 - Service 'SparkUI' could not bind on port 4042. Attempting port 4043. 2020-08-10 17:29:17 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. Spark context Web UI available at http://dev-03.com:4043 Spark context available as 'sc' (master = yarn, app id = application_1597051689954_0001). Spark session available as 'spark'. Welcome to____ __/ __/__ ___ _____/ /___\ \/ _ \/ _ `/ __/ '_//___/ .__/\_,_/_/ /_/\_\ version 2.3.0/_/Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) Type in expressions to have them evaluated. Type :help for more information.scala>成功
 對應的Application-Master的日志為
也就是說該任務會占據(jù)3個yarn container,一個用來啟動Application-Master,其他兩個是用來啟動executor
6. 提交一個spark自帶的計算任務
3. 小結(jié)
spark-on-yarn模式的spark安裝方式是非常簡單的,不需要搭建集群,同時,設置SCALA_HOME變量也不是必須的,因為spark自帶需要的scala庫
[root@dev-03 spark]# ll /usr/local/spark/jars/ |grep scala-rw-r--r-- 1 1311767953 1876110778 515645 Feb 23 2018 jackson-module-scala_2.11-2.6.7.1.jar -rw-r--r-- 1 1311767953 1876110778 15487351 Feb 23 2018 scala-compiler-2.11.8.jar -rw-r--r-- 1 1311767953 1876110778 5744974 Feb 23 2018 scala-library-2.11.8.jar -rw-r--r-- 1 1311767953 1876110778 802818 Feb 23 2018 scalap-2.11.8.jar -rw-r--r-- 1 1311767953 1876110778 423753 Feb 23 2018 scala-parser-combinators_2.11-1.0.4.jar -rw-r--r-- 1 1311767953 1876110778 4573750 Feb 23 2018 scala-reflect-2.11.8.jar -rw-r--r-- 1 1311767953 1876110778 671138 Feb 23 2018 scala-xml_2.11-1.0.5.jar4. 錯誤排查的詳細
針對上面的錯誤排查消耗了不少時間,主要是最開始只關注控制臺的報錯,而控制臺的報錯實際上并沒有提供什么有用的信息
 后面才想到實際上可以看node-manager對應的日志
 可以在yarn的后臺查看
可以看到剛才提交的application ,這個任務實際上已經(jīng)失敗了并且結(jié)束了,所以只能通過點擊最右邊的Tracking UI列的 History進行查看
 這個時候可以看到該application對應的 Application-Master的日志,也沒有看到任何異常的東西,只能錯誤發(fā)生在這之前,只能通過查看該nodemanager的所有日志來試試了。
 點擊node處的鏈接,進入node的信息頁面。點擊左邊的tools 菜單欄,再點擊local logs子菜單
 可以看到該node manager上的所有日志,點開 yarn-root-nodemanager-bj3-stag-search-03.com.log
 可以看到有很多日志,過濾一下warn級別的日志
這里說的是給了當前container 1GB 內(nèi)存,使用了33.4M,給了當前container 2.1G虛擬內(nèi)存,但是使用了2.4G虛擬內(nèi)存,所以會kill當前container。
 也就是在為Application-Master分配container的時候就失敗了。
 yarn的默認分配邏輯是每分配1G memory,就會分配2.1G virtual-memory。(由yarn.nodemanager.vmem-pmem-ratio控制)
1. 方案一,修改yarn的配置
這里采用的配置是,去除對virtual-memory分配的檢查,同時提升virtual-memory的比例
修改 /usr/local/hadoop/etc/hadoop/yarn-site.xml
 添加
修改之后的日志是這樣的
2020-08-10 17:29:26,892 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1597051689954_0001_01_000001 2020-08-10 17:29:26,914 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 2091 for container-id container_1597051689954_0001_01_000001: 66.6 MB of 1 GB physical memory used; 2.2 GB of 4 GB virtual memory used 2020-08-10 17:29:29,926 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 2091 for container-id container_1597051689954_0001_01_000001: 330.6 MB of 1 GB physical memory used; 2.4 GB of 4 GB virtual memory used2. 方案二,修改application-master,executor的physical memory設置
實際上nodemanager啟動container時候需要的虛擬內(nèi)存和physical memory并不是完全成比例的,只是比physical大一些,因為yarn.nodemanager.vmem-pmem-ratio默認為2.1,那么我們理論上上可以通過修改application-master,executor的physical-memory來增大virtual-memory的閾值,這樣就可以運行了。
 上面的yarn-site.xml不進行方案一的修改,只是在提交任務的時候改成這樣。
通過yarn查看application-master所在container的日志為(在 bj3-stag-search-03.com:8042上)
2020-08-11 09:48:54 INFO RMProxy:98 - Connecting to ResourceManager at dev-01.com/10.76.0.98:8030 2020-08-11 09:48:54 INFO YarnRMClient:54 - Registering the ApplicationMaster 2020-08-11 09:48:54 INFO YarnAllocator:54 - Will request 1 executor container(s), each with 1 core(s) and 1884 MB memory (including 384 MB of overhead) 2020-08-11 09:48:54 INFO YarnAllocator:54 - Submitted 1 unlocalized container requests. 2020-08-11 09:48:54 INFO ApplicationMaster:54 - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals 2020-08-11 09:48:55 INFO AMRMClientImpl:361 - Received new token for : dev-03.com:16935 2020-08-11 09:48:55 INFO YarnAllocator:54 - Launching container container_1597065725323_0003_01_000002 on host dev-03.com for executor with ID 1 2020-08-11 09:48:55 INFO YarnAllocator:54 - Received 1 containers from YARN, launching executors on 1 of them. 2020-08-11 09:48:55 INFO ContainerManagementProtocolProxy:81 - yarn.client.max-cached-nodemanagers-proxies : 0 2020-08-11 09:48:55 INFO ContainerManagementProtocolProxy:260 - Opening proxy : dev-03.com:16935在dev-03.com launch了container container_1597065725323_0003_01_000002
 也就是說當前任務總共啟動了兩個container,一個是運行Application-Master 在bj3-stag-search-03.com 上(container_1597065725323_0003_01_000001)
 還有一個是運行executor,在dev-03.com 上面(container_1597065725323_0003_01_000002)。
通過Application-Master所在node-manager日志看一下application-master對應的container的分配情況。
2020-08-11 09:48:51,792 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1597065725323_0003_01_000001 transitioned from LOCALIZING to LOCALIZED 2020-08-11 09:48:51,814 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1597065725323_0003_01_000001 transitioned from LOCALIZED to RUNNING 2020-08-11 09:48:51,832 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /usr/local/hadoop/tmp/nm-local-dir/usercache/root/appcache/application_1597065725323_0003/container_1597065725323_0003_01_000001/default_container_executor.sh] 2020-08-11 09:48:53,448 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1597065725323_0003_01_000001 2020-08-11 09:48:53,457 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 19654 for container-id container_1597065725323_0003_01_000001: 227.5 MB of 2 GB physical memory used; 3.3 GB of 4.2 GB virtual memory used 2020-08-11 09:48:56,468 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 19654 for container-id container_1597065725323_0003_01_000001: 357.4 MB of 2 GB physical memory used; 3.4 GB of 4.2 GB virtual memory used 2020-08-11 09:48:59,477 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 19654 for container-id container_1597065725323_0003_01_000001: 342.0 MB of 2 GB physical memory used; 3.4 GB of 4.2 GB virtual memory used可以看到為Application-Master分配了2G的physical memory 只使用了342M , 4.2G的virtual memory只使用了3.4G,
 所以Application-Master所在的container也成功的啟動起來了。
在dev-03.com 上看一下executor的container的分配情況。
2020-08-11 09:48:58,869 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1597065725323_0003_01_000002 transitioned from LOCALIZING to LOCALIZED 2020-08-11 09:48:58,886 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1597065725323_0003_01_000002 transitioned from LOCALIZED to RUNNING 2020-08-11 09:48:58,901 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [bash, /usr/local/hadoop/tmp/nm-local-dir/usercache/root/appcache/application_1597065725323_0003/container_1597065725323_0003_01_000002/default_container_executor.sh] 2020-08-11 09:49:00,126 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1597065725323_0003_01_000002 2020-08-11 09:49:00,134 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 11023 for container-id container_1597065725323_0003_01_000002: 223.8 MB of 2 GB physical memory used; 3.3 GB of 4.2 GB virtual memory used 2020-08-11 09:49:03,143 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree 11023 for container-id container_1597065725323_0003_01_000002: 327.3 MB of 2 GB physical memory used; 3.4 GB of 4.2 GB virtual memory used同樣executor的分配也是ok的。
 內(nèi)存的分配策略是按照1G遞增的,不滿1G則向上取整。
參考
 https://www.cnblogs.com/freeweb/p/5898850.html
 https://cloud.tencent.com/developer/article/1010903
總結(jié)
以上是生活随笔為你收集整理的spark-on-yarn安装centos的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: elasticsearch_script
- 下一篇: 04.内置analyzer和analyz
