使用Docker安装Spark集群(带有HDFS)
本實(shí)驗(yàn)在CentOS 7中完成
第一部分:安裝Docker
這一部分是安裝Docker,如果機(jī)器中已經(jīng)安裝過(guò)Docker,可以直接跳過(guò)
[root@VM-48-22-centos ~]# systemctl stop firewalld [root@VM-48-22-centos ~]# systemctl disable firewalld [root@VM-48-22-centos ~]# systemctl status firewalld [root@VM-48-22-centos ~]# setenforce 0 [root@VM-48-22-centos ~]# getenforce [root@VM-48-22-centos ~]# yum -y update [root@VM-48-22-centos ~]# mkdir /etc/yum.repos.d/oldrepo [root@VM-48-22-centos ~]# mv /etc/yum.repos.d/*.repo /etc/yum.repos.d/oldrepo/ [root@VM-48-22-centos ~]# wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo [root@VM-48-22-centos ~]# yum install -y yum-utils device-mapper-persistent-data lvm2 [root@VM-48-6-centos ~]# yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo [root@VM-48-22-centos ~]# yum clean all [root@VM-48-22-centos ~]# yum makecache fast [root@VM-48-22-centos ~]# yum list docker-ce --showduplicates | sort -r [root@VM-48-22-centos ~]# yum -y install docker-ce [root@VM-48-22-centos ~]# systemctl start docker [root@VM-48-22-centos ~]# systemctl enable docker [root@VM-48-22-centos ~]# ps -ef | grep docker [root@VM-48-22-centos ~]# docker version到這一步能正常輸出docker版本,說(shuō)明docker已經(jīng)成功安裝。
為了后續(xù)拉取鏡像能更快,需要添加一個(gè)鏡像:
[root@VM-48-22-centos ~]# vi /etc/docker/daemon.json {"registry-mirrors": ["https://x3n9jrcg.mirror.aliyuncs.com"]}重啟docker。
后續(xù)需要用git拉取一些資源,所以要安裝git,同樣為了能夠正常訪(fǎng)問(wèn)github,需要修改一下hosts文件。
添加以下內(nèi)容:
192.30.255.112 github.com git第二部分:安裝Docker-Compose
這一部分是安裝Docker-Compose,如果已經(jīng)安裝過(guò),可以直接跳過(guò)。
[root@VM-48-6-centos ~]# curl -L https://get.daocloud.io/docker/compose/releases/download/1.27.4/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose% Total % Received % Xferd Average Speed Time Time Time CurrentDload Upload Total Spent Left Speed 100 423 100 423 0 0 476 0 --:--:-- --:--:-- --:--:-- 476 100 11.6M 100 11.6M 0 0 6676k 0 0:00:01 0:00:01 --:--:-- 6676k [root@VM-48-6-centos ~]# chmod +x /usr/local/bin/docker-compose [root@VM-48-6-centos ~]# docker-compose --version docker-compose version 1.27.4, build 40524192能夠正常輸出docker-compose版本號(hào),說(shuō)明成功安裝。
第三部分:安裝docker-spark
接下來(lái)正式開(kāi)始在docker中安裝spark的環(huán)境。
[root@VM-48-6-centos ~]# wget https://raw.githubusercontent.com/zq2599/blog_demos/master/sparkdockercomposefiles/docker-compose.yml [root@VM-48-6-centos ~]# wget https://raw.githubusercontent.com/zq2599/blog_demos/master/sparkdockercomposefiles/hadoop.env [root@VM-48-6-centos ~]# docker-compose up -d就會(huì)開(kāi)始拉取鏡像
等候所有鏡像拉取完畢。
這時(shí)可以看到啟動(dòng)的容器
瀏覽器查看hdfs:
http://101.35.55.236:50070/ 【101.35.55.236要改成你的機(jī)器ip】
瀏覽器查看spark:
http://101.35.55.236:8080/ 【101.35.55.236要改成你的機(jī)器ip】
第四部分:執(zhí)行WordCount程序
準(zhǔn)備好一個(gè)txt文件(我的是Book7.txt)
在hdfs容器中創(chuàng)建/input文件夾
上傳txt文件到本機(jī)的input_files文件夾(input_files目錄掛載到namenode容器上了,因此會(huì)自動(dòng)同步到容器中)
將容器內(nèi)的文件上傳到hdfs
查看瀏覽器中的hdfs,是否有上傳好的文件:
方式一:在spark_shell運(yùn)行
[root@VM-48-6-centos ~]# docker exec -it master spark-shell --executor-memory 512M --total-executor-cores 2 2021-12-16 05:19:59 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = spark://master:7077, app id = app-20211216052007-0000). Spark session available as 'spark'. Welcome to____ __/ __/__ ___ _____/ /___\ \/ _ \/ _ `/ __/ '_//___/ .__/\_,_/_/ /_/\_\ version 2.3.0/_/Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131) Type in expressions to have them evaluated. Type :help for more information.scala> sc.textFile("hdfs://namenode:8020/input/Book7.txt").flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _).sortBy(_._2,false).take(10).foreach(println) (,12132) (the,10349) (and,5955) (to,4871) (of,4138) (a,3442) (he,2844) (Harry,2735) (was,2674) (his,2492)運(yùn)行完畢后要按Ctrl+C退出執(zhí)行
方式二:spark-submit執(zhí)行
需要準(zhǔn)備好一個(gè)jar包
[root@VM-48-6-centos ~]# ls conf data docker-compose.yml hadoop.env input_files jars [root@VM-48-6-centos ~]# cd jars [root@VM-48-6-centos jars]# wget https://raw.githubusercontent.com/zq2599/blog_demos/master/sparkdockercomposefiles/sparkwordcount-1.0-SNAPSHOT.jar --2021-12-16 13:35:52-- https://raw.githubusercontent.com/zq2599/blog_demos/master/sparkdockercomposefiles/sparkwordcount-1.0-SNAPSHOT.jar Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 5915 (5.8K) [application/octet-stream] Saving to: ‘sparkwordcount-1.0-SNAPSHOT.jar’100%[======================================================================================================================>] 5,915 --.-K/s in 0s2021-12-16 13:35:54 (90.7 MB/s) - ‘sparkwordcount-1.0-SNAPSHOT.jar’ saved [5915/5915][root@VM-48-6-centos jars]# docker exec -it master spark-submit \ > --class com.bolingcavalry.sparkwordcount.WordCount \ > --executor-memory 512m \ > --total-executor-cores 2 \ > /root/jars/sparkwordcount-1.0-SNAPSHOT.jar \ > namenode \ > 8020 \ > Book7.txt執(zhí)行完畢,輸出很多結(jié)果,其中有:
2021-12-16 05:36:25 INFO DAGScheduler:54 - ResultStage 4 (take at WordCount.java:78) finished in 0.157 s 2021-12-16 05:36:25 INFO DAGScheduler:54 - Job 1 finished: take at WordCount.java:78, took 0.442847 s 2021-12-16 05:36:25 INFO WordCount:90 - top 10 word :12132 the 10349 and 5955 to 4871 of 4138 a 3442 he 2844 Harry 2735 was 2674 his 2492參考資料
安裝docker-compose
https://www.cnblogs.com/xiao987334176/p/12377113.html
docker中搭建spark集群
https://blog.csdn.net/boling_cavalry/article/details/86851069
https://blog.csdn.net/weixin_42588332/article/details/119515003
總結(jié)
以上是生活随笔為你收集整理的使用Docker安装Spark集群(带有HDFS)的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: python爬取豆瓣电影TOP250
- 下一篇: 在阿里云Serverless K8S集群