hibench 压测flink_【原创】大数据基础之Benchmark(1)HiBench
HiBench 7
官方:https://github.com/intel-hadoop/HiBench
一 簡介
HiBench is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilizations. It contains a set of Hadoop, Spark and streaming workloads, including Sort, WordCount, TeraSort, Sleep, SQL, PageRank, Nutch indexing, Bayes, Kmeans, NWeight and enhanced DFSIO, etc. It also contains several streaming workloads for Spark Streaming, Flink, Storm and Gearpump.
There are totally 19 workloads in HiBench.
Supported Hadoop/Spark/Flink/Storm/Gearpump releases:
Hadoop: Apache Hadoop 2.x, CDH5, HDP
Spark: Spark 1.6.x, Spark 2.0.x, Spark 2.1.x, Spark 2.2.x
Flink: 1.0.3
Storm: 1.0.1
Gearpump: 0.8.1
Kafka: 0.8.2.2
二 spark sql測試
1 download
$ wget https://github.com/intel-hadoop/HiBench/archive/HiBench-7.0.tar.gz
$ tar xvf HiBench-7.0.tar.gz
$ cd HiBench-HiBench-7.0
2 build
1)build all
$ mvn -Dspark=2.1 -Dscala=2.11 clean package
2)build hadoopbench and sparkbench
$ mvn -Phadoopbench -Psparkbench -Dspark=2.1 -Dscala=2.11 clean package
3)only build spark sql
$ mvn -Psparkbench -Dmodules -Psql -Dspark=2.1 -Dscala=2.11 clean package
3 prepare
$ cp conf/hadoop.conf.template conf/hadoop.conf
$ vi conf/hadoop.conf
$ cp conf/spark.conf.template conf/spark.conf
$ vi conf/spark.conf
$ vi conf/hibench.conf
# Data scale profile. Available value is tiny, small, large, huge, gigantic and bigdata.
# The definition of these profiles can be found in the workload's conf file i.e. conf/workloads/micro/wordcount.conf
hibench.scale.profile bigdata
4 run
sql測試分為3種:scan/aggregation/join
$ bin/workloads/sql/scan/prepare/prepare.sh
$ bin/workloads/sql/scan/spark/run.sh
具體配置位于conf/workloads/sql/scan.conf
prepare之后會在hdfs的/HiBench/Scan/Input下生成測試數據,在report/scan/prepare/下生成報告
run之后會在report/scan/spark/下生成報告,比如monitor.html,在hive的default庫下可以看到測試數據表
$ bin/workloads/sql/join/prepare/prepare.sh
$ bin/workloads/sql/join/spark/run.sh
$ bin/workloads/sql/aggregation/prepare/prepare.sh
$ bin/workloads/sql/aggregation/spark/run.sh
依此類推
如果prepare時報錯內存溢出
嘗試修改
$ vi bin/functions/workload_functions.sh
local CMD="${HADOOP_EXECUTABLE} --config ${HADOOP_CONF_DIR} jar $job_jar $job_name $tail_arguments"
格式:hadoop jar -D mapreduce.reduce.memory.mb=5120 -D mapreduce.reduce.java.opts=-Xmx4608m
發現不能生效,嘗試增加map數量
$ vi bin/functions/hibench_prop_env_mapping.py:
NUM_MAPS="hibench.default.map.parallelism",
$ vi conf/hibench.conf
hibench.default.map.parallelism 5000
參考:
https://github.com/intel-hadoop/HiBench/blob/master/docs/build-hibench.md
https://github.com/intel-hadoop/HiBench/blob/master/docs/run-sparkbench.md
與50位技術專家面對面20年技術見證,附贈技術全景圖總結
以上是生活随笔為你收集整理的hibench 压测flink_【原创】大数据基础之Benchmark(1)HiBench的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 自动定位失败_端到端定位5G SA接入问
- 下一篇: 3 vue 线条箭头_教程|PPT绘制箭