es用canals怎么和mysql同步_搬运基础服务到kubernetes,遇这3类大坑怎么破?
工作中需要將原本部署在物理機或虛擬機上的一些基礎服務搬到kubernetes中,在搬的過程中遇到了不少坑,筆者在此特別分享一下所遇到的問題及相應的解決方法~
一、異常網(wǎng)絡引起的問題
之前使用redis-operator在kubernetes中部署了一套Redis集群,可測試的同事使用redis-benchmark隨便一壓測,這個集群就會出問題。經(jīng)過艱苦的問題查找過程,終于發(fā)現(xiàn)了問題,原來是兩個虛擬機之間的網(wǎng)絡存在異常。
經(jīng)驗教訓,在測試前可用iperf3先測試下node節(jié)點之間,pod節(jié)點之間的網(wǎng)絡狀況,方法如下:
1234567891011121314#?在某臺node節(jié)點上啟動iperf3服務端$?iperf3?--server#?在另一臺node節(jié)點上啟動iperf3客戶端$?iperf3?--client?${node_ip}??--length?150?--parallel?100?-t?60#?在kuberntes中部署iperf3的服務端與客戶端$?kubectl?apply?-f?https://raw.githubusercontent.com/Pharb/kubernetes-iperf3/master/iperf3.yaml#?查看iperf3相關pod的podIP$?kubectl?get?pod?-o?wide#?在某個iperf3?client的pod中執(zhí)行iperf3命令,以測試其到iperf3?server?pod的網(wǎng)絡狀況$?kubectl?exec?-ti?iperf3-clients-5b5ll?--?iperf3?--client?${iperf3_server_pod_ip}?--length?150?--parallel?100?-t?60
二、mysql低版本引起的集群腦裂之前使用mysql-operator在kubernetes中部署了一套3節(jié)點MySQL InnoDB集群,測試反饋壓測一段時間后,這個集群會變得不可訪問。檢查出問題時mysql集群中mysql容器的日志,發(fā)現(xiàn)以下問題:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748$?kubectl?logs?mysql-0?-c?mysql2018-04-22T15:24:36.984054Z?0?[ERROR]?[MY-000000]?[InnoDB]?InnoDB:?Assertion?failure:?log0write.cc:1799:time_elapsed?>=?0InnoDB:?thread?139746458191616InnoDB:?We?intentionally?generate?a?memory?trap.InnoDB:?Submit?a?detailed?bug?report?to?http://bugs.mysql.com.InnoDB:?If?you?get?repeated?assertion?failures?or?crashes,?evenInnoDB:?immediately?after?the?mysqld?startup,?there?may?beInnoDB:?corruption?in?the?InnoDB?tablespace.?Please?refer?toInnoDB:?http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.htmlInnoDB:?about?forcing?recovery.15:24:36?UTC?-?mysqld?got?signal?6?;This?could?be?because?you?hit?a?bug.?It?is?also?possible?that?this?binaryor?one?of?the?libraries?it?was?linked?against?is?corrupt,?improperly?built,or?misconfigured.?This?error?can?also?be?caused?by?malfunctioning?hardware.Attempting?to?collect?some?information?that?could?help?diagnose?the?problem.As?this?is?a?crash?and?something?is?definitely?wrong,?the?informationcollection?process?might?fail.key_buffer_size=8388608read_buffer_size=131072max_used_connections=1max_threads=151thread_count=2connection_count=1It?is?possible?that?mysqld?could?use?up?tokey_buffer_size?+?(read_buffer_size?+?sort_buffer_size)*max_threads?=?67841?K??bytes?of?memoryHope?that's?ok;?if?not,?decrease?some?variables?in?the?equation.Thread?pointer:?0x0Attempting?backtrace.?You?can?use?the?following?information?to?find?outwhere?mysqld?died.?If?you?see?no?messages?after?this,?something?wentterribly?wrong...stack_bottom?=?0?thread_stack?0x46000/home/mdcallag/b/orig811/bin/mysqld(my_print_stacktrace(unsigned?char*,?unsigned?long)+0x3d)?[0x1b1461d]/home/mdcallag/b/orig811/bin/mysqld(handle_fatal_signal+0x4c1)?[0xd58441]/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)?[0x7f1cae617390]/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)?[0x7f1cacb0a428]/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)?[0x7f1cacb0c02a]/home/mdcallag/b/orig811/bin/mysqld(ut_dbg_assertion_failed(char?const*,?char?const*,?unsigned?long)+0xea)?[0xb25e13]/home/mdcallag/b/orig811/bin/mysqld()?[0x1ce5408]/home/mdcallag/b/orig811/bin/mysqld(log_flusher(log_t*)+0x2fb)?[0x1ce5fab]/home/mdcallag/b/orig811/bin/mysqld(std::thread::_Impl<:_bind_simple>?>::_M_run()+0x68)?[0x1ccbe18]/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)?[0x7f1cad476c80]/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)?[0x7f1cae60d6ba]/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)?[0x7f1cacbdc41d]The?manual?page?at?http://dev.mysql.com/doc/mysql/en/crashing.html?contains
在mysql的bug跟蹤系統(tǒng)里搜索了一下,果然發(fā)現(xiàn)了這個bug(https://bugs.mysql.com/bug.php?id=90670),官方提示這個bug在8.0.12之前都存在,推薦升級到8.0.13之后的版本。還好mysql-operator支持安裝指定版本的MySQL,這里通過指定版本為最新穩(wěn)定版8.0.16解決問題。
1234567apiVersion:?mysql.oracle.com/v1alpha1kind:?Clustermetadata:??name:?mysqlspec:??members:?3??version:?"8.0.16"
三、超額使用ephemeral-storage空間引起集群故障
MySQL InnoDB集群方案中依賴于MySQL Group Replication在主從節(jié)點間同步數(shù)據(jù),這種同步本質(zhì)上是依賴于MySQL的binlog的,因此如果是壓測場景,會在短時間內(nèi)產(chǎn)生大量binlog日志,而這些binlog日志十分占用存儲空間。而如果使用使用mysql-operator創(chuàng)建MySQL集群,如果在yaml文件中不聲明volumeClaimTemplate,則pod會使用ephemeral-storage空間,雖然kubernetes官方提供了辦法:(https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/#requests-and-limits-setting-for-local-ephemeral-storage)來設置ephemeral-storage空間的配額,但mysql-operator本身并沒有提供參數(shù)讓用戶指定ephemeral-storage空間的配額。這樣當MySQL集群長時間壓測后,產(chǎn)生的大量binlog會超額使用ephemeral-storage空間,最終kubernetes為了保證容器平臺的穩(wěn)定,會將該pod殺掉,當3節(jié)點MySQL集群中有2個pod被殺掉時,整個集群就處于不法自動恢復的狀態(tài)了。
123456Events:Type?????Reason???Age???From?????????????????Message??----?????------???----??----?????????????????-------Warning??Evicted??39m???kubelet,?9.77.34.64??The?node?was?low?on?resource:?ephemeral-storage.?Container?mysql?was?using?256Ki,?which?exceeds?its?request?of?0.?Container?mysql-agent?was?using?11572Ki,?which?exceeds?its?request?of?0.Normal???Killing??39m???kubelet,?9.77.34.64??Killing?container?with?id?docker://mysql-agent:Need?to?kill?PodNormal???Killing??39m???kubelet,?9.77.34.64??Killing?container?with?id?docker://mysql:Need?to?kill?Pod
解決辦法也很簡單,一是參考示例:(https://github.com/oracle/mysql-operator/blob/master/examples/cluster/cluster-with-data-volume-and-backup-volume.yaml)在yaml文件中聲明volumeClaimTemplate,另外還可以在mysql的配置文件中指定binlog_expire_logs_seconds參數(shù)(https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_expire_logs_seconds)在保證在壓測場景下,能快速刪除binlog,方法如下:
12345678910111213141516171819202122232425262728293031323334353637383940apiVersion:?v1data:my.cnf:?|[mysqld]default_authentication_plugin=mysql_native_passwordskip-name-resolvebinlog_expire_logs_seconds=300kind:?ConfigMapmetadata:??name:?mycnf---apiVersion:?mysql.oracle.com/v1alpha1kind:?Clustermetadata:??name:?mysqlspec:??members:?3??version:?"8.0.16"??config:????name:?mycnf??volumeClaimTemplate:????metadata:??????name:?data????spec:??????storageClassName:?default??????accessModes:????????-?ReadWriteMany??????resources:????????requests:??????????storage:?1Gi??backupVolumeClaimTemplate:????metadata:??????name:?backup-data????spec:??????storageClassName:?default??????accessModes:????????-?ReadWriteMany??????resources:????????requests:??????????storage:?1Gi
至此,Redis集群、MySQL集群終于可以穩(wěn)定地在kubernetes中運行了。
總結
以上是生活随笔為你收集整理的es用canals怎么和mysql同步_搬运基础服务到kubernetes,遇这3类大坑怎么破?的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: java系统教程_Java 教程(开发环
- 下一篇: 精通java图片_面试必备:详解Java