实例演示使用HiBench对Hadoop集群进行基准测试
HiBench
一、簡介
HiBench 是一個大數據基準套件,可幫助評估不同的大數據框架的速度、吞吐量和系統資源利用率。
它包含一組 Hadoop、Spark 和流式工作負載,包括
Sort、WordCount、TeraSort、Repartition、Sleep、SQL、PageRank、 Nutch indexing、Bayes、Kmeans、NWeight 和增強型 DFSIO 等。
它還包含多個用于 Spark Streaming 的流式工作負載、Flink、Storm 和 Gearpump。
工作負載類別:micro, ml(machine learning), sql, graph, websearch and streaming
支持的框架:Hadoop、Spark、Flink、Storm、Gearpum
二、檢查環境
我的集群(ubuntu16)上已經安裝了Hadoop所以本文只測試hadoopbench。
檢查是否安裝好了環境。如果缺少的話可以參考本文的 “前置準備” 進行安裝。如果已經準備好環境,就直接從 “安裝HiBench” 開始。
我的環境(供參考):
| hadoop | 2.10(官方要求Apache Hadoop 3.0.x, 3.1.x, 3.2.x, 2.x, CDH5, HDP) |
| maven | 3.3.9 |
| java | 8 |
| python | 2.7 |
三、前置準備
(說明:安裝這些軟件時我是在CentOS上測試的,如果你的機器不適用請參考其他教程來安裝。)
安裝hadoop
可以參考這篇文章來安裝:https://www.jianshu.com/p/4e0dc91ad86e
安裝java
下載java8的rpm
wget https://mirrors.huaweicloud.com/java/jdk/8u181-b13/jdk-8u181-linux-x64.rpmrpm安裝
rpm -ivh jdk-8u181-linux-x64.rpm配java環境
vim /etc/profile JAVA_HOME=/usr/java/jdk1.8.0_181-amd64 CLASSPATH=%JAVA_HOME%/lib:%JAVA_HOME%/jre/lib PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin export PATH CLASSPATH JAVA_HOME使環境變量生效
source /etc/profile安裝maven
wget https://dlcdn.apache.org/maven/maven-3/3.8.5/binaries/apache-maven-3.8.5-bin.zip --no-check-certificate unzip apache-maven-3.8.5-bin.zip -d /usr/local/ cd vim .bashrc source .bashrc mvn -v # set maven environment export M3_HOME=/usr/local/apache-maven-3.5.0 export PATH=$M3_HOME/bin:$PATH換阿里云鏡像,加快下載速度
vi /usr/local/apache-maven-3.8.5-bin/conf/setting.xml <mirrors><mirror> <id>alimaven</id> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/groups/public/</url> <mirrorOf>central</mirrorOf> </mirror> </mirrors>安裝python
原本有python3.7.3,想換成2.7,所以安裝pyenv用于管理多個python
如果原本就是2.7就不需要換了
yum -y install git git clone https://gitee.com/krypln/pyenv.git ~/.pyenv echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.bashrc exec $SHELL mkdir $PYENV_ROOT/cache && cd $PYENV_ROOT/cache sudo yum install zlib-devel bzip2 bzip2-devel readline-devel sqlite sqlite-devel openssl-devel xz xz-devel libffi-devel git wget wget https://mirrors.huaweicloud.com/python/2.7.2/Python-2.7.2.tar.xz cd /root/.pyenv/plugins/python-build/share/python-build vim 2.7.2 pyenv install 2.7.22.7.2的內容(這里改成本地文件是為了加快安裝速度,不然下載是很慢的)
#install_package "Python-2.7.2" "https://www.python.org/ftp/python/2.7.2/Python-2.7.2.tgz#1d54b7096c17902c3f40ffce7e5b84e0072d0144024184fff184a84d563abbb3" ldflags_dirs standard verify_py27 copy_python_gdb ensurepipinstall_package "Python-2.7.2" /root/.pyenv/cache/Python-2.7.2.tar.xz ldflags_dirs standard verify_py27 copy_python_gdb ensurepip查看并切換python版本
pyenv versions pyenv global 2.7.2安裝bc
# 安裝 bc 用于生成 report 信息 yum install bc四、安裝HiBench
下載hibench
git clone https://github.com/Intel-bigdata/HiBench.git構建需要的模塊
mvn -Phadoopbench -Dmodules -Psql -Dscala=2.11 clean package或者也可以構建全部模塊(時間會比較長,我用了一個多小時)
mvn -Dspark=2.4 -Dscala=2.11 clean package五、配置Hibench
HiBench/conf文件夾下有幾個配置文件需要配置:
- hibench.conf
- hadoop.conf
- frameworks.lst
- benchmark.lst
以下逐個來配置:
然后修改hadoop.conf配置文件:
vi hadoop.conf填寫以下內容(要根據自己的機器修改):
# Hadoop home hadoop的家目錄 hibench.hadoop.home /usr/local/hadoop# The path of hadoop executable hibench.hadoop.executable ${hibench.hadoop.home}/bin/hadoop# Hadoop configraution directory hibench.hadoop.configure.dir ${hibench.hadoop.home}/etc/hadoop# The root HDFS path to store HiBench data hibench.hdfs.master hdfs://master:9000# Hadoop release provider. Supported value: apache hibench.hadoop.release apache上面HDFS的path是怎么得到的呢?需要到hadoop的安裝目錄下找到etc/hadoop/core-site.xml,就能看到hdfs的命名空間
amax@master:/usr/local/hadoop/etc/hadoop$ vi core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!--Licensed under the Apache License, Version 2.0 (the "License");you may not use this file except in compliance with the License.You may obtain a copy of the License athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License. See accompanying LICENSE file. --><!-- Put site-specific property overrides in this file. --><configuration><!--配置hdfs文件系統的命名空間--><property><name>fs.defaultFS</name><value>hdfs://master:9000</value></property><!-- 配置操作hdfs的存沖大小 --><property><name>io.file.buffer.size</name><value>4096</value></property><!-- 配置臨時數據存儲目錄 --><property><name>hadoop.tmp.dir</name><value>/usr/local/hadoop/tmp</value></property> </configuration> ~我使用hadoop
amax@master:~/Hibench/Hibench-master/conf$ vi frameworks.lst hadoop # spark先測試一下wordcount,其他注釋掉
amax@master:~/Hibench/Hibench-master/conf$ vi benchmarks.lst #micro.sleep #micro.sort #micro.terasort micro.wordcount #micro.repartition #micro.dfsioe#sql.aggregation #sql.join #sql.scan#websearch.nutchindexing #websearch.pagerank#ml.bayes #ml.kmeans #ml.lr #ml.als #ml.pca #ml.gbt #ml.rf #ml.svd #ml.linear #ml.lda #ml.svm #ml.gmm #ml.correlation #ml.summarizer#graph.nweight六、運行Hibench
要在hadoop的安裝目錄下啟動hadoop
./start-all.sh增加執行權限
amax@master:~/Hibench/Hibench-master/bin$ chmod +x -R functions/ amax@master:~/Hibench/Hibench-master/bin$ chmod +x -R workloads/ amax@master:~/Hibench/Hibench-master/bin$ chmod +x run_all.sh在HiBench的bin目錄下開始運行
amax@master:~/Hibench/Hibench-master/bin$ ./run_all.sh Prepare micro.wordcount ... Exec script: /home/amax/Hibench/Hibench-master/bin/workloads/micro/wordcount/prepare/prepare.sh patching args= Parsing conf: /home/amax/Hibench/Hibench-master/conf/hadoop.conf Parsing conf: /home/amax/Hibench/Hibench-master/conf/hibench.conf Parsing conf: /home/amax/Hibench/Hibench-master/conf/workloads/micro/wordcount.conf probe sleep jar: /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.1-tests.jar start HadoopPrepareWordcount bench hdfs rm -r: /usr/local/hadoop/bin/hadoop --config /usr/local/hadoop/etc/hadoop fs -rm -r -skipTrash hdfs://master:9000/HiBench/Wordcount/Input Deleted hdfs://master:9000/HiBench/Wordcount/Input Submit MapReduce Job: /usr/local/hadoop/bin/hadoop --config /usr/local/hadoop/etc/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar randomtextwriter -D mapreduce.randomtextwriter.totalbytes=32000 -D mapreduce.randomtextwriter.bytespermap=4000 -D mapreduce.job.maps=8 -D mapreduce.job.reduces=8 hdfs://master:9000/HiBench/Wordcount/Input The job took 14 seconds. finish HadoopPrepareWordcount bench Run micro/wordcount/hadoop Exec script: /home/amax/Hibench/Hibench-master/bin/workloads/micro/wordcount/hadoop/run.sh patching args= Parsing conf: /home/amax/Hibench/Hibench-master/conf/hadoop.conf Parsing conf: /home/amax/Hibench/Hibench-master/conf/hibench.conf Parsing conf: /home/amax/Hibench/Hibench-master/conf/workloads/micro/wordcount.conf probe sleep jar: /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.1-tests.jar start HadoopWordcount bench hdfs rm -r: /usr/local/hadoop/bin/hadoop --config /usr/local/hadoop/etc/hadoop fs -rm -r -skipTrash hdfs://master:9000/HiBench/Wordcount/Output rm: `hdfs://master:9000/HiBench/Wordcount/Output': No such file or directory hdfs du -s: /usr/local/hadoop/bin/hadoop --config /usr/local/hadoop/etc/hadoop fs -du -s hdfs://master:9000/HiBench/Wordcount/Input Submit MapReduce Job: /usr/local/hadoop/bin/hadoop --config /usr/local/hadoop/etc/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar wordcount -D mapreduce.job.maps=8 -D mapreduce.job.reduces=8 -D mapreduce.inputformat.class=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat -D mapreduce.outputformat.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat -D mapreduce.job.inputformat.class=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat -D mapreduce.job.outputformat.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat hdfs://master:9000/HiBench/Wordcount/Input hdfs://master:9000/HiBench/Wordcount/OutputBytes Written=22168 finish HadoopWordcount bench Run all done!這樣就運行成功了,可以自己換別的benchmark嘗試。
七、查看報告
執行完成后的報告都在Hibench的report文件夾下,可以自行查看
amax@master:~/Hibench/Hibench-master/report$ vi hibench.report Type Date Time Input_data_size Duration(s) Throughput(bytes/s) Throughput/node HadoopWordcount 2022-03-27 15:17:33 35706 23.176 1540 256查看report/wordcount/prepare里的log
amax@master:~/Hibench/Hibench-master/report/wordcount/prepare$ vi bench.log 2022-03-27 15:16:48 INFO Connecting to ResourceManager at master/172.31.58.2:8032 Running 8 maps. Job started: Sun Mar 27 15:16:49 CST 2022 2022-03-27 15:16:49 INFO Connecting to ResourceManager at master/172.31.58.2:8032 2022-03-27 15:16:49 INFO number of splits:8 2022-03-27 15:16:50 INFO Submitting tokens for job: job_1641806957654_0004 2022-03-27 15:16:50 INFO resource-types.xml not found 2022-03-27 15:16:50 INFO Unable to find 'resource-types.xml'. 2022-03-27 15:16:50 INFO Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE 2022-03-27 15:16:50 INFO Adding resource type - name = vcores, units = , type = COUNTABLE 2022-03-27 15:16:50 INFO Submitted application application_1641806957654_0004 2022-03-27 15:16:50 INFO The url to track the job: http://master:8088/proxy/application_1641806957654_0004/ 2022-03-27 15:16:50 INFO Running job: job_1641806957654_0004 2022-03-27 15:16:57 INFO Job job_1641806957654_0004 running in uber mode : false 2022-03-27 15:16:57 INFO map 0% reduce 0% 2022-03-27 15:17:02 INFO map 100% reduce 0% 2022-03-27 15:17:03 INFO Job job_1641806957654_0004 completed successfully 2022-03-27 15:17:03 INFO Counters: 33File System CountersFILE: Number of bytes read=0FILE: Number of bytes written=1675976FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=968HDFS: Number of bytes written=35706HDFS: Number of read operations=32HDFS: Number of large read operations=0HDFS: Number of write operations=16Job CountersKilled map tasks=1Launched map tasks=8Other local map tasks=8Total time spent by all maps in occupied slots (ms)=237250Total time spent by all reduces in occupied slots (ms)=0Total time spent by all map tasks (ms)=23725Total vcore-milliseconds taken by all map tasks=23725Total megabyte-milliseconds taken by all map tasks=242944000Map-Reduce FrameworkMap input records=8Map output records=48Input split bytes=968查看report/wordcount/hadoop里的log
amax@master:~/Hibench/Hibench-master/report/wordcount/hadoop$ vi bench.log 2022-03-27 15:17:12 INFO Connecting to ResourceManager at master/172.31.58.2:8032 2022-03-27 15:17:13 INFO Total input files to process : 8 2022-03-27 15:17:13 INFO number of splits:8 2022-03-27 15:17:13 INFO mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 2022-03-27 15:17:13 INFO mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 2022-03-27 15:17:13 INFO Submitting tokens for job: job_1641806957654_0005 2022-03-27 15:17:13 INFO resource-types.xml not found 2022-03-27 15:17:13 INFO Unable to find 'resource-types.xml'. 2022-03-27 15:17:13 INFO Adding resource type - name = memory-mb, units = Mi, type = COUNTABLE 2022-03-27 15:17:13 INFO Adding resource type - name = vcores, units = , type = COUNTABLE 2022-03-27 15:17:13 INFO Submitted application application_1641806957654_0005 2022-03-27 15:17:13 INFO The url to track the job: http://master:8088/proxy/application_1641806957654_0005/ 2022-03-27 15:17:13 INFO Running job: job_1641806957654_0005 2022-03-27 15:17:20 INFO Job job_1641806957654_0005 running in uber mode : false 2022-03-27 15:17:20 INFO map 0% reduce 0% 2022-03-27 15:17:26 INFO map 100% reduce 0% 2022-03-27 15:17:32 INFO map 100% reduce 88% 2022-03-27 15:17:33 INFO map 100% reduce 100% 2022-03-27 15:17:33 INFO Job job_1641806957654_0005 completed successfully 2022-03-27 15:17:33 INFO Counters: 51File System CountersFILE: Number of bytes read=40236FILE: Number of bytes written=3443888FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=36666HDFS: Number of bytes written=22168HDFS: Number of read operations=56HDFS: Number of large read operations=0HDFS: Number of write operations=16Job CountersKilled reduce tasks=1Launched map tasks=8Launched reduce tasks=8Data-local map tasks=7Rack-local map tasks=1Total time spent by all maps in occupied slots (ms)=239320Total time spent by all reduces in occupied slots (ms)=481640Total time spent by all map tasks (ms)=23932Total time spent by all reduce tasks (ms)=24082Total vcore-milliseconds taken by all map tasks=23932關于這個log里面的字段可以查閱相關文檔:
http://hadoopmania.blogspot.com/2015/10/performance-monitoring-testing-and.html
總結
以上是生活随笔為你收集整理的实例演示使用HiBench对Hadoop集群进行基准测试的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 案例逐步演示python利用正则表达式提
- 下一篇: 最小公倍数和最大公约数