clickhouse SSB 性能测试
SSB(Star Schema Benchmark)的介紹論文地址:
https://www.cs.umb.edu/~poneil/StarSchemaB.PDF
官網(wǎng)鏈接 https://clickhouse.com/docs/en/getting-started/example-datasets/star-schema/
如果安裝系統(tǒng)時,時最小化mini安裝,經(jīng)常會提示很多命令不存在
- 提示 git 不存在,使用 yum install git 安裝即可
- 提示 make: command not found,使用以下命令安裝 make yum install -y gcc gcc-c++ automake autoconf libtool make
ssb-dbgen 測試工具 GitHub 地址 https://github.com/vadimtk/ssb-dbgen
下載并編譯測試工具
git clone https://github.com/vadimtk/ssb-dbgen.git cd ssb-dbgen make之后就會在當前目錄生成 dbgen 和 qgen 這兩個可執(zhí)行文件
|
| ./dbgen -s 1 -T p | part.tbl |
| ./dbgen -s 1 -T s | suppliers.tbl |
| ./dbgen -s 1 -T c | customers.tbl |
| ./dbgen -s 1 -T d | date.tbl |
| ./dbgen -s 1 -T l | lineorder.tbl |
| ./dbgen -s 1 -T a | 一次性生成以上所有表 |
添加 -h 符號,會將文件大小進行格式化,并顯示單位
使用 head -n 10 customer.tbl 命令打印前10行可以看到,tbl 文件是用逗號分隔列,然后使用換行符分割行的
這里數(shù)據(jù)量業(yè)界有一個統(tǒng)稱叫做SF 1SF == 1G
裝載數(shù)據(jù) 1G -s 1 == 1G #:但這樣有一個弊端 那就是如果裝載的數(shù)據(jù)量特別大的時候例如 1T 這樣基本需要花費一天的時間 所以我來用多線程的dbgen方法
方法如下:
./dbgen -vfF -s 1000 -S 1 -C 6 & ./dbgen -vfF -s 1000 -S 2 -C 6 & ./dbgen -vfF -s 1000 -S 3 -C 6 & ./dbgen -vfF -s 1000 -S 4 -C 6 & ./dbgen -vfF -s 1000 -S 5 -C 6 & ./dbgen -vfF -s 1000 -S 6 -C 6參數(shù)詳解
- -v 詳細信息
- -s 表示生成數(shù)據(jù)的規(guī)模
- -S 切分數(shù)據(jù)
- -f 覆蓋之前的文件
dss.ddl 這個文件存儲的是建表的語句
cat dss.ddl 逐一執(zhí)行里面的建表語句
建表語句可以在官網(wǎng)拿到,使用外部工具,比如 DBeaver 建表
CREATE DATABASE IF NOT EXISTS ssb USE ssb CREATE TABLE IF NOT EXISTS customer (C_CUSTKEY UInt32,C_NAME String,C_ADDRESS String,C_CITY LowCardinality(String),C_NATION LowCardinality(String),C_REGION LowCardinality(String),C_PHONE String,C_MKTSEGMENT LowCardinality(String) ) ENGINE = MergeTree ORDER BY (C_CUSTKEY);CREATE TABLE IF NOT EXISTS lineorder (LO_ORDERKEY UInt32,LO_LINENUMBER UInt8,LO_CUSTKEY UInt32,LO_PARTKEY UInt32,LO_SUPPKEY UInt32,LO_ORDERDATE Date,LO_ORDERPRIORITY LowCardinality(String),LO_SHIPPRIORITY UInt8,LO_QUANTITY UInt8,LO_EXTENDEDPRICE UInt32,LO_ORDTOTALPRICE UInt32,LO_DISCOUNT UInt8,LO_REVENUE UInt32,LO_SUPPLYCOST UInt32,LO_TAX UInt8,LO_COMMITDATE Date,LO_SHIPMODE LowCardinality(String) ) ENGINE = MergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);CREATE TABLE IF NOT EXISTS part (P_PARTKEY UInt32,P_NAME String,P_MFGR LowCardinality(String),P_CATEGORY LowCardinality(String),P_BRAND LowCardinality(String),P_COLOR LowCardinality(String),P_TYPE LowCardinality(String),P_SIZE UInt8,P_CONTAINER LowCardinality(String) ) ENGINE = MergeTree ORDER BY P_PARTKEY;CREATE TABLE IF NOT EXISTS supplier (S_SUPPKEY UInt32,S_NAME String,S_ADDRESS String,S_CITY LowCardinality(String),S_NATION LowCardinality(String),S_REGION LowCardinality(String),S_PHONE String ) ENGINE = MergeTree ORDER BY S_SUPPKEY;之后定位到 dbgen 生成的目錄,使用 pwd 命令,可以打印出當前目錄
接下來,使用 clickhouse-client 工具,將 tbl 中的數(shù)據(jù)導(dǎo)入數(shù)據(jù)庫
注意:如果你使用的不是 default 數(shù)據(jù)庫(默認數(shù)據(jù)庫),請在表前面加上數(shù)據(jù)庫前綴,否則會報如下錯誤
這是維度為1,即 -s 1 情況下的數(shù)據(jù)量
SELECT (SELECT COUNT(1) FROM customer) AS customer, (SELECT COUNT(1) FROM lineorder) AS lineorder, (SELECT COUNT(1) FROM part) AS part,(SELECT COUNT(1) FROM supplier) AS supplier
下面這段SQL,會將星型模式(star schema)轉(zhuǎn)化為 非標準化的(denormalized)平面模型(flat schema)。也就是說,將原本相關(guān)聯(lián)的表結(jié)構(gòu),通過某種關(guān)系,整合到一張表里去。
clickhouse 默認內(nèi)存大小是1G。在做 SSB 的星型轉(zhuǎn)平面模型的時,如果沒有增加最大內(nèi)存限制,就會報如下錯誤(DB::Exception: Memory limit (total) exceeded)
注意:set_memory_usage = 20000000000 這個函數(shù)在外部的數(shù)據(jù)庫管理工具,比如DBeaver中是無法設(shè)置的。但在 clickhouse-client 中設(shè)置卻有效(本地或遠程均可)。這是因為,該函數(shù)僅支持在 TCP 模式下被調(diào)用,即 port=9000
另外,set 函數(shù)是針對當前會話的,只要一退出,立馬又會還原成之前的樣子。
查看變量值的命令為:SELECT name,value FROM system.settings WHERE name = 'max_memory_usage'
由于0太多,不好數(shù)。我們可以使用 formatReadableSize 函數(shù)將其格式化。但由于 system.settings 這個表中的 value 是字符串類型的,因此我們必須將起轉(zhuǎn)為 Int 類型,這就要用到 toInt64 函數(shù)了(Int32長度不夠,會導(dǎo)致數(shù)值溢出)
完整的命令如下:SELECT name,formatReadableSize(toInt64(value)) FROM system.settings WHERE name = 'max_memory_usage';
接著先將可用內(nèi)存增大后,再刪除之前創(chuàng)建后轉(zhuǎn)化失敗的表,然后進行測試
如果機子內(nèi)存沒有這么大,上述語句無效。但可以通過 SET min_insert_block_size_rows=8192; 減小單次插入塊的行大小(默認值為 1048545)這樣就能安全通過了。
可以看到在虛擬機3G內(nèi)存下的速度為:每秒71萬行,30MB/S
在執(zhí)行時,打開另一個終端,在clickhouse-client 中執(zhí)行 SHOW PROCESSLIST 會打印當前執(zhí)行的進度
將字符串復(fù)制出來,把單引號替換成雙引號。放在在線JSON格式化頁面進行轉(zhuǎn)義
運行下面查詢語句(使用 USE ssb; 設(shè)置默認數(shù)據(jù)庫)
Q1.1
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;Q1.2
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;Q1.3
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;Q2.1
SELECTsum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM lineorder_flat WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA' GROUP BYyear,P_BRAND ORDER BYyear,P_BRAND;
Q2.2
SELECTsum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM lineorder_flat WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA' GROUP BYyear,P_BRAND ORDER BYyear,P_BRAND;
Q2.3
SELECTsum(LO_REVENUE),toYear(LO_ORDERDATE) AS year,P_BRAND FROM lineorder_flat WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE' GROUP BYyear,P_BRAND ORDER BYyear,P_BRAND;Q3.1
SELECTC_NATION,S_NATION,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997 GROUP BYC_NATION,S_NATION,year ORDER BYyear ASC,revenue DESC;
Q3.2
SELECTC_CITY,S_CITY,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997 GROUP BYC_CITY,S_CITY,year ORDER BYyear ASC,revenue DESC;Q3.3
SELECTC_CITY,S_CITY,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997 GROUP BYC_CITY,S_CITY,year ORDER BYyear ASC,revenue DESC;Q3.4
SELECTC_CITY,S_CITY,toYear(LO_ORDERDATE) AS year,sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712 GROUP BYC_CITY,S_CITY,year ORDER BYyear ASC,revenue DESC;Q4.1
SELECTtoYear(LO_ORDERDATE) AS year,C_NATION,sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BYyear,C_NATION ORDER BYyear ASC,C_NATION ASC;Q4.2
SELECTtoYear(LO_ORDERDATE) AS year,S_NATION,P_CATEGORY,sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2') GROUP BYyear,S_NATION,P_CATEGORY ORDER BYyear ASC,S_NATION ASC,P_CATEGORY ASC;
Q4.3
SELECTtoYear(LO_ORDERDATE) AS year,S_CITY,P_BRAND,sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14' GROUP BYyear,S_CITY,P_BRAND ORDER BYyear ASC,S_CITY ASC,P_BRAND ASC;總結(jié)
以上是生活随笔為你收集整理的clickhouse SSB 性能测试的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: [转载]使用 Apache Geroni
- 下一篇: DDNS请求到底请求了什么