遇到上亿(MySQL)大表的优化....
點擊上方?好好學java?,選擇?星標?公眾號
重磅資訊、干貨,第一時間送達今日推薦:Nginx 為什么快到根本停不下來?
個人原創100W+訪問量博客:點擊前往,查看更多
前段時間剛入職一家公司,就遇上這事!
背景
XX實例(一主一從)xxx告警中每天凌晨在報SLA報警,該報警的意思是存在一定的主從延遲(若在此時發生主從切換,需要長時間才可以完成切換,要追延遲來保證主從數據的一致性)
XX實例的慢查詢數量最多(執行時間超過1s的sql會被記錄),XX應用那方每天晚上在做刪除一個月前數據的任務
分析
使用pt-query-digest工具分析最近一周的mysql-slow.log
結果第一部分
最近一個星期內,總共記錄的慢查詢執行花費時間為25403s,最大的慢sql執行時間為266s,平均每個慢sql執行時間5s,平均掃描的行數為1766萬
結果第二部分
select arrival_record操作記錄的慢查詢數量最多有4萬多次,平均響應時間為4s,delete arrival_record記錄了6次,平均響應時間258s。
select xxx_record語句
select arrival_record 慢查詢語句都類似于如下所示,where語句中的參數字段是一樣的,傳入的參數值不一樣
select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G
select arrival_record 語句在mysql中最多掃描的行數為5600萬、平均掃描的行數為172萬,推斷由于掃描的行數多導致的執行時間長
查看執行計劃
explain select?count(*) from?arrival_record where?product_id=26?and receive_time between '2019-03-25 14:00:00'?and '2019-03-25 15:00:00'?and receive_spend_ms>=0\G; *************************** 1.?row *************************** id: 1 select_type: SIMPLE table: arrival_record partitions: NULL type: ref possible_keys: IXFK_arrival_record key: IXFK_arrival_record key_len: 8 ref: const rows: 32261320 filtered: 3.70 Extra: Using index condition; Using where 1?row in?set, 1?warning (0.00?sec)用到了索引IXFK_arrival_record,但預計掃描的行數很多有3000多w行
show index from arrival_record; +----------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table |?Non_unique | Key_name |?Seq_in_index | Column_name |?Collation | Cardinality |?Sub_part | Packed |?Null | Index_type |?Comment | Index_comment | +----------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | arrival_record |?0?| PRIMARY |?1?| id |?A | 107990720 |?NULL | NULL |?| BTREE |?| | | arrival_record |?1?| IXFK_arrival_record |?1?| product_id |?A | 1344 |?NULL | NULL |?| BTREE |?| | | arrival_record |?1?| IXFK_arrival_record |?2?| station_no |?A | 22161 |?NULL | NULL |?YES | BTREE |?| | | arrival_record |?1?| IXFK_arrival_record |?3?| sequence |?A | 77233384 |?NULL | NULL |?| BTREE |?| | | arrival_record |?1?| IXFK_arrival_record |?4?| receive_time |?A | 65854652 |?NULL | NULL |?YES | BTREE |?| | | arrival_record |?1?| IXFK_arrival_record |?5?| arrival_time |?A | 73861904 |?NULL | NULL |?YES | BTREE |?| | +----------------+------------+---------------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ show create table arrival_record; .......... arrival_spend_ms bigint(20) DEFAULT NULL, total_spend_ms bigint(20) DEFAULT NULL, PRIMARY KEY (id), KEY IXFK_arrival_record (product_id,station_no,sequence,receive_time,arrival_time) USING BTREE, CONSTRAINT FK_arrival_record_product FOREIGN KEY (product_id) REFERENCES product (id) ON DELETE NO ACTION ON UPDATE NO ACTION ) ENGINE=InnoDB AUTO_INCREMENT=614538979?DEFAULT CHARSET=utf8 COLLATE=utf8_bin |該表總記錄數約1億多條,表上只有一個復合索引,product_id字段基數很小,選擇性不好
傳入的過濾條件 where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0 沒有station_nu字段,使用不到復合索引 IXFK_arrival_record的?product_id,station_no,sequence,receive_time?這幾個字段
根據最左前綴原則,select arrival_record只用到了復合索引IXFK_arrival_record的第一個字段product_id,而該字段選擇性很差,導致掃描的行數很多,執行時間長
receive_time字段的基數大,選擇性好,可對該字段單獨建立索引,select arrival_record sql就會使用到該索引
現在已經知道了在慢查詢中記錄的select arrival_record where語句傳入的參數字段有 product_id,receive_time,receive_spend_ms,還想知道對該表的訪問有沒有通過其它字段來過濾了?
神器tcpdump出場的時候到了
使用tcpdump抓包一段時間對該表的select語句
tcpdump -i bond0 -s 0?-l?-w?- dst port 3316?| strings | grep?select | egrep -i 'arrival_record'?>/tmp/select_arri.log獲取select 語句中from 后面的where條件語句
IFS_OLD=$IFS IFS=$'\n' for?i in?`cat /tmp/select_arri.log `;do?echo?${i#*'from'}; done?| less IFS=$IFS_OLDarrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40'?and?arrivalrec0_.product_id=17?and?arrivalrec0_.station_no='56742' arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40'?and?arrivalrec0_.product_id=22?and?arrivalrec0_.station_no='S7100' arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40'?and?arrivalrec0_.product_id=24?and?arrivalrec0_.station_no='V4631' arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40'?and?arrivalrec0_.product_id=22?and?arrivalrec0_.station_no='S9466' arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40'?and?arrivalrec0_.product_id=24?and?arrivalrec0_.station_no='V4205' arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40'?and?arrivalrec0_.product_id=24?and?arrivalrec0_.station_no='V4105' arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40'?and?arrivalrec0_.product_id=24?and?arrivalrec0_.station_no='V4506' arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40'?and?arrivalrec0_.product_id=24?and?arrivalrec0_.station_no='V4617' arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40'?and?arrivalrec0_.product_id=22?and?arrivalrec0_.station_no='S8356' arrival_record arrivalrec0_ where arrivalrec0_.sequence='2019-03-27 08:40'?and?arrivalrec0_.product_id=22?and?arrivalrec0_.station_no='S8356' select 該表 where條件中有product_id,station_no,sequence字段,可以使用到復合索引IXFK_arrival_record的前三個字段綜上所示,優化方法為,刪除復合索引IXFK_arrival_record,建立復合索引idx_sequence_station_no_product_id,并建立單獨索引indx_receive_time
delete xxx_record語句
該delete操作平均掃描行數為1.1億行,平均執行時間是262s
delete語句如下所示,每次記錄的慢查詢傳入的參數值不一樣
delete?from?arrival_record where?receive_time < STR_TO_DATE('2019-02-23', '%Y-%m-%d')\G執行計劃
explain?select?* from?arrival_record where?receive_time < STR_TO_DATE('2019-02-23', '%Y-%m-%d')\G *************************** 1.?row?*************************** id: 1 select_type: SIMPLE table: arrival_record partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 109501508 filtered: 33.33 Extra: Using?where 1?row?in?set, 1?warning?(0.00?sec)該delete語句沒有使用索引(沒有合適的索引可用),走的全表掃描,導致執行時間長
優化方法也是 建立單獨索引indx_receive_time(receive_time)
測試
拷貝arrival_record表到測試實例上進行刪除重新索引操作
XX實例arrival_record表信息
磁盤占用空間48G,mysql中該表大小為31G,存在17G左右的碎片,大多由于刪除操作造成的(記錄被刪除了,空間沒有回收)
備份還原該表到新的實例中,刪除原來的復合索引,重新添加索引進行測試
mydumper并行壓縮備份
user=root passwd=xxxx socket=/datas/mysql/data/3316/mysqld.sock db=cq_new_cimiss table_name=arrival_record backupdir=/datas/dump_$table_name mkdir -p $backupdir?nohup echo `date +%T`?&& mydumper -u $user -p $passwd -S $socket -B $db -c -T $table_name -o $backupdir -t 32?-r 2000000?&& echo `date +%T`?&并行壓縮備份所花時間(52s)和占用空間(1.2G,實際該表占用磁盤空間為48G,mydumper并行壓縮備份壓縮比相當高!)
Started?dump?at: 2019-03-26?12:46:04 ........Finished?dump?at: 2019-03-26?12:46:56du?-sh???/datas/dump_arrival_record/ 1.2G??/datas/dump_arrival_record/拷貝dump數據到測試節點
多線程導入數據
time myloader -u root -S /datas/mysql/data/3308/mysqld.sock -P 3308 -p root -B test?-d /datas/dump_arrival_record -t 32real 126m42.885s
user 1m4.543s
sys 0m4.267s
邏輯導入該表后磁盤占用空間
du -h -d 1 /datas/mysql/data/3308/test/arrival_record.* 12K /datas/mysql/data/3308/test/arrival_record.frm 30G /datas/mysql/data/3308/test/arrival_record.ibd 沒有碎片,和mysql的該表的大小一致cp?-rp /datas/mysql/data/3308?/datas分別使用online DDL和 pt-osc工具來做刪除重建索引操作
先刪除外鍵,不刪除外鍵,無法刪除復合索引,外鍵列屬于復合索引中第一列
做DDL 參考
實施
由于是一主一從實例,應用是連接的vip,刪除重建索引采用online ddl來做。停止主從復制后,先在從實例上做(不記錄binlog),主從切換,再在新切換的從實例上做(不記錄binlog)
function red_echo () {local what="$*"echo -e "$(date +%F-%T) ${what}" }function check_las_comm(){if [ "$1" != "0" ];thenred_echo "$2"echo "exit 1"exit 1fi }red_echo "stop slave" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"stop slave" check_las_comm "$?" "stop slave failed"red_echo "online ddl begin"mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"set sql_log_bin=0;select now() as ddl_start;ALTER TABLE $db_.\`${table_name}\` DROP FOREIGN KEY FK_arrival_record_product,drop index IXFK_arrival_record,add index idx_product_id_sequence_station_no(product_id,sequence,station_no),add index idx_receive_time(receive_time);select now() as ddl_stop" >>${log_file} 2>& 1red_echo "onlie ddl stop"red_echo "add foreign key"mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"set sql_log_bin=0;ALTER TABLE $db_.${table_name} ADD CONSTRAINT _FK_${table_name}_product FOREIGN KEY (product_id) REFERENCES cq_new_cimiss.product (id) ON DELETE NO ACTION ON UPDATE NO ACTION;" >>${log_file} 2>& 1check_las_comm "$?" "add foreign key error"red_echo "add foreign key stop"red_echo "start slave" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e"start slave" check_las_comm "$?" "start slave failed"執行時間
2019-04-08-11:17:36 stop slave
mysql: [Warning] Using a password on the command line interface can be insecure.
ddl_start
2019-04-08?11:17:36
ddl_stop
2019-04-08?11:45:13
2019-04-08-11:45:13 onlie ddl stop
2019-04-08-11:45:13?add foreign key
mysql: [Warning] Using a password on the command line interface can be insecure.
2019-04-08-12:33:48 add foreign key stop
2019-04-08-12:33:48?start slave
再次查看delete 和select語句的執行計劃
explain select?count(*) from?arrival_record where?receive_time < STR_TO_DATE('2019-03-10', '%Y-%m-%d')\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: arrival_record partitions: NULL type: range possible_keys: idx_receive_time key: idx_receive_time key_len: 6 ref: NULL rows: 7540948 filtered: 100.00 Extra: Using where; Using index explain select?count(*) from?arrival_record where?product_id=26?and receive_time between '2019-03-25 14:00:00'?and '2019-03-25 15:00:00'?and receive_spend_ms>=0\G; *************************** 1.?row *************************** id: 1 select_type: SIMPLE table: arrival_record partitions: NULL type: range possible_keys: idx_product_id_sequence_station_no,idx_receive_time key: idx_receive_time key_len: 6 ref: NULL rows: 291448 filtered: 16.66 Extra: Using index condition; Using where 都使用到了idx_receive_time 索引,掃描的行數大大降低索引優化后
delete 還是花費了77s時間
delete?from?arrival_record where?receive_time < STR_TO_DATE('2019-03-10', '%Y-%m-%d')\Gdelete 語句通過receive_time的索引刪除300多萬的記錄花費77s時間*
delete大表優化為小批量刪除
應用端已優化成每次刪除10分鐘的數據(每次執行時間1s左右),xxx中沒在出現SLA(主從延遲告警)
另一個方法是通過主鍵的順序每次刪除20000條記錄
#得到滿足時間條件的最大主鍵ID #通過按照主鍵的順序去 順序掃描小批量刪除數據 #先執行一次以下語句SELECT MAX(id) INTO @need_delete_max_id FROM `arrival_record` WHERE receive_time<'2019-03-01' ;DELETE FROM arrival_record WHERE id<@need_delete_max_id LIMIT 20000;select ROW_COUNT(); #返回20000#執行小批量delete后會返回row_count(), 刪除的行數 #程序判斷返回的row_count()是否為0,不為0執行以下循環,為0退出循環,刪除操作完成DELETE FROM arrival_record WHERE id<@need_delete_max_id LIMIT 20000;select ROW_COUNT(); #程序睡眠0.5s總結
表數據量太大時,除了關注訪問該表的響應時間外,還要關注對該表的維護成本(如做DDL表更時間太長,delete歷史數據)。
對大表進行DDL操作時,要考慮表的實際情況(如對該表的并發表,是否有外鍵)來選擇合適的DDL變更方式。
對大數據量表進行delete,用小批量刪除的方式,減少對主實例的壓力和主從延遲。
作者:jia-xin
原文:https://www.cnblogs.com/YangJiaXin/p/10828244.html
最后,再附上我歷時三個月總結的?Java 面試 + Java 后端技術學習指南,筆者這幾年及春招的總結,github 1.4k star,拿去不謝!下載方式1.?首先掃描下方二維碼 2.?后臺回復「Java面試」即可獲取總結
以上是生活随笔為你收集整理的遇到上亿(MySQL)大表的优化....的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 生产环境JVM内存溢出案例分析!
- 下一篇: Mybatis接口Mapper内的方法为