华为云 和 阿里云 跨服务器搭建Hadoop集群
目錄
- 華為云 和 阿里云 跨服務器搭建Hadoop集群
- 說明
- 期間遇到的問題
- CentOS6 7 8更換阿里yum源
- 修改服務器名稱
- 安裝JDK
- 安裝Hadoop
- 編寫集群分發腳本 xsync
- scp(secure copy) 安全拷貝
- rsync遠程同步工具
- xsync 集群分發腳本
- 無密訪問
- 集群配置(著急直接看這)
- 配置host
- hadoop102
- Hadoop103
- Hadoop104
- 核心配置文件
- HDFS 配置文件
- YARN 配置文件
- MapReduce 配置文件
- 分發配置
- 群起集群
- 配置 **workers**
- 啟動集群
- 集群停止
- 單獨啟動某些進程
- 單獨啟動hdfs的相關進程
- 單獨啟動yarn的相關進程
華為云 和 阿里云 跨服務器搭建Hadoop集群
說明
我有三個服務器:華為云102、阿里云103、阿里云104,跨服務器(機房)搭建一個hadoop集群
期間遇到的問題
CentOS6 7 8更換阿里yum源
阿里云Linux安裝鏡像源地址:http://mirrors.aliyun.com
配置方法:
1.備份
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.backup2.下載 新的CentOS-Base.repo到/etc/yum.repos.d
CentOS 6 wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repoCentOS 7 wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repoCentOS 8 wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-8.repo執行 2 可能遇到
CentOS 7 下載yum源報錯:正在解析主機 mirrors.aliyun.com (mirrors.aliyun.com)... 失敗:未知的名稱或服務。解決方法
解決方法 :
登錄root用戶,用vim /etc/resolv.conf ,打開rsolv.conf,添加DNS地址
(nameserver 223.5.5.5 和 nameserver 223.6.6.6選擇其中一個添加即可)
若未解決,查看網絡配置,使用ifconfig 或 ip addr查看網卡名,用 vim
/etc/sysconfig/network-scripts/ifcfg-(網卡名),查看網絡參數是否正常
3.更新緩存
yum clean all && yum makecache yum update -y此時換源操作已完成
保障ping 百度能ping通
ping www.baidu.com.cn配置到這發現我這邊服務器是ping不通百度,能ping ip 但是域名不行 網上找了好多教程沒有對的,(鄙視亂寫的)
我自己的解決方案
vim /etc/resolv.conf改為
nameserver 8.8.8.8改完之后就可以ping通www.baidu.com
修改服務器名稱
如果你需要的話,我目前是單臺服務器,后期肯定會上多臺服務器,方便日后的配置管理
vim /etc/hostname快速開發: tab鍵提示命令
安裝JDK
壓縮包下載 關注后端碼匠 回復 電腦環境獲取 ,也可以自己去oracle官網下載
配置環境變量
vim /etc/profilecd /etc/profile.dvim my_env.sh#JAVA_HOME export JAVA_HOME=/usr/java/jdk1.8.0_221 export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarsource /etc/profile安裝Hadoop
解壓縮
tar -zxvf hadoop-3.1.1.tar.gz -C /opt/module/pwd /opt/module/hadoop-3.1.1cd /etc/profile.dvim my_env.sh#HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.1 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbinsource /etc/profile刪除文件夾命令
rmdir dir
rm -rf dir/
測試分詞統計
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount wcinput/ ./wcoutput [root@linuxmxz hadoop-3.1.1]# cd wcoutput [root@linuxmxz wcoutput]# ll 總用量 4 -rw-r--r-- 1 root root 41 4月 1 11:24 part-r-00000 -rw-r--r-- 1 root root 0 4月 1 11:24 _SUCCESS [root@linuxmxz wcoutput]# cat part-r-00000 abnzhang 1 bobo 1 cls 2 mike 1 s 1 sss 1編寫集群分發腳本 xsync
scp(secure copy) 安全拷貝
scp可以實現服務器與服務器之間的數據拷貝(from server 1 to server2)
scp -r pdir/pdir/pdir/fname user@host:user@host:user@host:pdir/$fname
命令 遞歸 要拷貝的文件路徑/名稱 目的地用戶@主機:目的地路徑/名稱
例如
本機向遠程推數據
scp -r jdk1.8.0_212/ root@ip:/opt/module/本機上遠程拉數據
scp -r root@ip:/opt/module/hadoop-3.1.1 ./本機操作另外兩個機器進行傳輸
scp -r root@ip:/opt/module/* root@ip:/opt/module/?
刪除文件夾 rm -rf wcinput/ wcoutput/ 刪除兩個文件夾
rsync遠程同步工具
rsync只對差異文件進行更新,scp是將所有文件都復制過去
基本語法
rsync -av pdir/pdir/pdir/fname user@user@user@host:pdir/pdir/pdir/fname
命令 選項參數 要拷貝的文件路徑/名稱 目的地用戶@主機:目的地路徑名稱
rsync -av hadoop-3.1.1/ root@ip:/opt/module/hadoop-3.1.1/xsync 集群分發腳本
#! /bin/bash #1 獲取輸入參數個數,如果沒有參數,直接退出 pcount=$# if 【 $pcount -lt 1 】 thenecho No Enough Arguement!exit; fi#2. 遍歷集群所有機器 for host in hadoop102 hadoop103 hadoop104 doecho ==================== $host ====================#3. 遞歸遍歷所有目錄for file in $@do#4 判斷文件是否存在if 【 -e $file 】then#5. 獲取全路徑pdir=$(cd -P $(dirname $file); pwd)echo pdir=$pdir#6. 獲取當前文件的名稱fname=$(basename $file)echo fname=$fname#7. 通過ssh執行命令:在$host主機上遞歸創建文件夾(如果存在該文件夾)ssh $host "source /etc/profile;mkdir -p $pdir"#8. 遠程同步文件至$host主機的$USER用戶的$pdir文件夾下rsync -av $pdir/$fname $USER@$host:$pdirelseecho $file Does Not Exists!fidone done無密訪問
adduser codingcepasswd codingcecchown -R codingce:codingce hadoop-3.1.1/chmod 770 hadoop-3.1.1/ls -al 查詢所有的底層文件 ssh-keygen -t rsacat id_rsa #私鑰 cat id_rsa_pub #公鑰 # 把公鑰放在 .ssh 文件夾[codingce@linuxmxz .ssh]# ssh-copy-id 66.108.177.66 /usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub" /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys codingce@39.108.177.66's password: Number of key(s) added: 1# 操作之后就可以通過 ssh ip 直接訪問改服務器 還得做下對自己免密碼登錄。 ssh ip集群配置(著急直接看這)
集群部署規劃
-
注意 NameNode 和 SecondaryNameNode 不要安裝在同一臺服務器上
-
ResourceManager 也很耗內存, 不要和 NameNode、SecondaryNameNode配置在同一臺服務器上
配置host
hadoop102
就是這塊踩了一下午坑
[root@linuxmxz hadoop-3.1.1]# vim /etc/hosts#內網102 另外兩臺是外網內網ip hadoop102 外網ip hadoop103 外網ip hadoop104Hadoop103
[root@linuxmxz hadoop-3.1.1]# vim /etc/hosts外網ip hadoop102 內網ip hadoop103 外網ip hadoop104Hadoop104
[root@linuxmxz hadoop-3.1.1]# vim /etc/hosts外網ip hadoop102 外網ip hadoop103 內網ip hadoop104核心配置文件
核心配置文件 core-site.xml
[root@linuxmxz hadoop]# cd $HADOOP_HOME/etc/hadoop[codingce@linuxmxz hadoop]$ vim core-site.xml<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --><configuration><!-- 指定 NameNode 的地址 --> <property><name>fs.defaultFS</name><value>hdfs://hadoop102:8020</value> </property> <!-- 指定 hadoop 數據的存儲目錄 --><property><name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-3.1.1/data</value></property> <!-- 配置 HDFS 網頁登錄使用的靜態用戶為 codingce --> <property><name>hadoop.http.staticuser.user</name><value>codingce</value> </property> </configuration>HDFS 配置文件
配置 hdfs-site.xml
[codingce@linuxmxz hadoop]$ vim hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file. --><!-- Put site-specific property overrides in this file. --><configuration> <!-- nn web端訪問地址--> <property><name>dfs.namenode.http-address</name><value>hadoop102:9870</value> </property><!-- 2nn web 端訪問地址--> <property><name>dfs.namenode.secondary.http-address</name><value>hadoop104:9868</value> </property> </configuration>YARN 配置文件
配置 yarn-site.xml
[codingce@linuxmxz hadoop]$ vim yarn-site.xml<configuration><!-- 指定 MR 走 shuffle --> <property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value> </property><!-- 指定 ResourceManager 的地址--> <property><name>yarn.resourcemanager.hostname</name><value>hadoop103</value> </property><!-- 環境變量的繼承 --><property> <name>yarn.nodemanager.env-whitelist</name><value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CO NF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAP RED_HOME</value></property> </configuration>MapReduce 配置文件
配置 mapred-site.xml
[codingce@linuxmxz hadoop]$ vim mapred-site.xml<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration><!-- 指定 MapReduce 程序運行在 Yarn 上 --> <property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>yarn.app.mapreduce.am.env</name><value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.1</value></property><property><name>mapreduce.map.env</name><value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.1</value></property><property><name>mapreduce.reduce.env</name><value>HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.1</value></property> </configuration>分發配置
# 一[codingce@linuxmxz hadoop]$ rsync -av core-site.xml codingce@66.108.177.66:/opt/module/hadoop-3.1.1/etc/hadoop/ sending incremental file list core-site.xmlsent 599 bytes received 47 bytes 1,292.00 bytes/sec total size is 1,176 speedup is 1.82[codingce@linuxmxz hadoop]$ rsync -av core-site.xml codingce@119.23.69.66:/opt/module/hadoop-3.1.1/etc/hadoop/ sending incremental file list core-site.xmlsent 599 bytes received 47 bytes 1,292.00 bytes/sec total size is 1,176 speedup is 1.82# 二[codingce@linuxmxz hadoop]$ rsync -av hdfs-site.xml codingce@119.23.69.66:/opt/module/hadoop-3.1.1/etc/hadoop/ sending incremental file list hdfs-site.xmlsent 511 bytes received 47 bytes 1,116.00 bytes/sec total size is 1,088 speedup is 1.95[codingce@linuxmxz hadoop]$ rsync -av hdfs-site.xml codingce@66.108.177.66:/opt/module/hadoop-3.1.1/etc/hadoop/ sending incremental file list hdfs-site.xmlsent 511 bytes received 47 bytes 1,116.00 bytes/sec total size is 1,088 speedup is 1.95# 三[codingce@linuxmxz hadoop]$ rsync -av yarn-site.xml codingce@66.108.177.66:/opt/module/hadoop-3.1.1/etc/hadoop/ sending incremental file list yarn-site.xmlsent 651 bytes received 47 bytes 1,396.00 bytes/sec total size is 1,228 speedup is 1.76[codingce@linuxmxz hadoop]$ rsync -av yarn-site.xml codingce@119.23.69.66:/opt/module/hadoop-3.1.1/etc/hadoop/ sending incremental file list yarn-site.xmlsent 651 bytes received 47 bytes 1,396.00 bytes/sec total size is 1,228 speedup is 1.76# 四 [codingce@linuxmxz hadoop]$ rsync -av mapred-site.xml codingce@119.23.69.66:/opt/module/hadoop-3.1.1/etc/hadoop/ sending incremental file listsent 73 bytes received 12 bytes 170.00 bytes/sec total size is 1,340 speedup is 15.76[codingce@linuxmxz hadoop]$ rsync -av mapred-site.xml codingce@66.108.177.66:/opt/module/hadoop-3.1.1/etc/hadoop/ sending incremental file listsent 73 bytes received 12 bytes 170.00 bytes/sec total size is 1,340 speedup is 15.76群起集群
配置 workers
[codingce@linuxmxz hadoop]$ vim workershadoop102 hadoop103 hadoop104注意:該文件中添加的內容結尾不允許有空格,文件中不允許有空行。
同步所有節點配置文件
[codingce@linuxmxz hadoop]$ rsync -av workers codingce@39.108.177.65:/opt/module/hadoop-3.1.1/etc/hadoop/ sending incremental file list workerssent 143 bytes received 41 bytes 368.00 bytes/sec total size is 30 speedup is 0.16[codingce@linuxmxz hadoop]$ rsync -av workers codingce@119.23.69.213:/opt/module/hadoop-3.1.1/etc/hadoop/ sending incremental file list workerssent 143 bytes received 41 bytes 122.67 bytes/sec total size is 30 speedup is 0.16啟動集群
(1) 如果集群是第一次啟動,需要在 hadoop102 節點格式化 NameNode(注意:格式
化 NameNode,會產生新的集群 id,導致 NameNode 和 DataNode 的集群 id 不一致,集群找 不到已往數據。如果集群在運行過程中報錯,需要重新格式化 NameNode 的話,一定要先停 止 namenode 和 datanode 進程,并且要刪除所有機器的 data 和 logs 目錄,然后再進行格式
化。)
[codingce@linuxmxz hadoop-3.1.1]$ hdfs namenode -format(2) 啟動 HDFS
[codingce@linuxmxz hadoop-3.1.1]$ sbin/start-dfs.shStarting namenodes on [39.108.177.65] Starting datanodes 39.108.177.65: datanode is running as process 17487. Stop it first. 119.23.69.213: datanode is running as process 7274. Stop it first. Starting secondary namenodes [119.23.69.213] [codingce@linuxmxz hadoop-3.1.1]$[codingce@linuxmxz ~]$ jps 23621 NodeManager 23766 Jps 23339 DataNode [codingce@linuxmxz hadoop-3.1.1]$ ssh 66.108.177.66 [codingce@hyf hadoop-3.1.1]$ sbin/start-yarn.shStarting resourcemanagerStarting nodemanagers[codingce@hyf ~]$ jps 19204 Jps 18533 NodeManager 17487 DataNode [codingce@hyf ~]$ ssh 119.23.69.66 [codingce@zjx ~]$ jps 7824 NodeManager 7274 DataNode 7965 Jps(3) 在配置了 ResourceManager 的節點(hadoop103)啟動 YARN
sbin/start-dfs.shstop-dfs.shstop-yarn.shsbin/start-yarn.shnetstat -tlpn 查詢所有開放的ip集群停止
停止hdfs,任意節點執行:
stop-dfs.sh停止yarn,在yarn主節點執行:
stop-yarn.sh如果是偽分布式環境,也可以直接執行:
stop-all.sh單獨啟動某些進程
如果啟動集群的過程,有些進程沒有啟動,可以嘗試單獨啟動對應的進程:
單獨啟動hdfs的相關進程
hdfs --daemon start hdfs進程
hdfs --daemon start namenode hdfs --daemon start datanode hdfs --daemon start secondarynamenode單獨啟動yarn的相關進程
yarn --daemon start yarn的相關進程
yarn --daemon start resourcemanager yarn --daemon start nodemanager總結
以上是生活随笔為你收集整理的华为云 和 阿里云 跨服务器搭建Hadoop集群的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 订阅插件提示:This system i
- 下一篇: Dokcer安装Redis