使用Zabbix监控ZooKeeper服务的健康状态
一 應用場景描述
在目前公司的業務中,沒有太多使用ZooKeeper作為協同服務的場景。但是我們將使用Codis作為Redis的集群部署方案,Codis依賴ZooKeeper來存儲配置信息。所以做好ZooKeeper的監控也很重要。
二 ZooKeeper監控要點
系統監控
內存使用量 ? ?ZooKeeper應當完全運行在內存中,不能使用到SWAP。Java Heap大小不能超過可用內存。
Swap使用量 ? ?使用Swap會降低ZooKeeper的性能,設置vm.swappiness = 0
網絡帶寬占用 ? 如果發現ZooKeeper性能降低關注下網絡帶寬占用情況和丟包情況,通常情況下ZooKeeper是20%寫入80%讀入
磁盤使用量 ? ?ZooKeeper數據目錄使用情況需要注意
磁盤I/O ? ? ?ZooKeeper的磁盤寫入是異步的,所以不會存在很大的I/O請求,如果ZooKeeper和其他I/O密集型服務公用應該關注下磁盤I/O情況
ZooKeeper監控
zk_avg/min/max_latency? ? 響應一個客戶端請求的時間,建議這個時間大于10個Tick就報警
zk_outstanding_requests? ? ? ? 排隊請求的數量,當ZooKeeper超過了它的處理能力時,這個值會增大,建議設置報警閥值為10
zk_packets_received? ? ? 接收到客戶端請求的包數量
zk_packets_sent? ? ? ? 發送給客戶單的包數量,主要是響應和通知
zk_max_file_descriptor_count ? 最大允許打開的文件數,由ulimit控制
zk_open_file_descriptor_count ? ?打開文件數量,當這個值大于允許值得85%時報警
Mode ? ? ? ? ? ? ? ?運行的角色,如果沒有加入集群就是standalone,加入集群式follower或者leader
zk_followers ? ? ? ? ?leader角色才會有這個輸出,集合中follower的個數。正常的值應該是集合成員的數量減1
zk_pending_syncs? ? ? ?leader角色才會有這個輸出,pending syncs的數量
zk_znode_count ? ? ? ? znodes的數量
zk_watch_count ? ? ? ? watches的數量
Java Heap Size ? ? ? ? ZooKeeper Java進程的
三 編寫Zabbix監控ZooKeeper的腳本和配置文件
要讓Zabbix收集到這些監控數據,有兩種方法一種是每個監控項目通過zabbix agent單獨獲取,主動監控和被動監控都可以。還有一種方法就是將這些監控數據一次性使用zabbix_sender全部發送給zabbix。這里我們選擇第二種方式。那么采用zabbix_sender一次性發送全部監控數據的腳本就不能像通過zabbix agent這樣逐個獲取監控項目來編寫腳本。
首先想辦法將監控項目匯集成一個字典,然后遍歷這個字典,將字典中的key:value對通過zabbix_sender的-k和-o參數指定發送出去
echo mntr|nc 127.0.0.1 2181
這條命令可以使用Python的subprocess模塊調用,也可以使用socket模塊去訪問2181端口然后發送命令獲取數據,獲取到mntr執行的數據后還需要將其轉化成為字典數據
即需要將這種樣式的數據
zk_version 3.4.6-1569965,?built?on?02/20/2014?09:09?GMT zk_avg_latency 0 zk_max_latency 0 zk_min_latency 0 zk_packets_received 91 zk_packets_sent 90 zk_num_alive_connections 1 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 17159 zk_watch_count 0 zk_ephemerals_count 1 zk_approximate_data_size 6666471 zk_open_file_descriptor_count 27 zk_max_file_descriptor_count 102400轉換成為這樣的數據
{'zk_followers':?2,?'zk_outstanding_requests':?0,?'zk_approximate_data_size':?6666471,?'zk_packets_sent':?2089,?'zk_pending_syncs':?0,?'zk_avg_latency':?0,?'zk_version':?'3.4.6-1569965,?built?on?02/20/2014?09:09?GMT',?'zk_watch_count':?2,?'zk_packets_received':?2090,?'zk_open_file_descriptor_count':?30,?'zk_server_ruok':?'imok',?'zk_server_state':?'leader',?'zk_synced_followers':?2,?'zk_max_latency':?28,?'zk_num_alive_connections':?2,?'zk_min_latency':?0,?'zk_ephemerals_count':?1,?'zk_znode_count':?17159,?'zk_max_file_descriptor_count':?102400}到最后需要使用zabbix_sender發送的數據格式這個樣子的
zookeeper.status[zk_version]這是key的名稱
zookeeper.status[zk_outstanding_requests]:0 zookeeper.status[zk_approximate_data_size]:6666471 zookeeper.status[zk_packets_sent]:48 zookeeper.status[zk_avg_latency]:0 zookeeper.status[zk_version]:3.4.6-1569965,?built?on?02/20/2014?09:09?GMT zookeeper.status[zk_watch_count]:0 zookeeper.status[zk_packets_received]:49 zookeeper.status[zk_open_file_descriptor_count]:27 zookeeper.status[zk_server_ruok]:imok zookeeper.status[zk_server_state]:follower zookeeper.status[zk_max_latency]:0 zookeeper.status[zk_num_alive_connections]:1 zookeeper.status[zk_min_latency]:0 zookeeper.status[zk_ephemerals_count]:1 zookeeper.status[zk_znode_count]:17159 zookeeper.status[zk_max_file_descriptor_count]:102400精簡代碼如下:
#!/usr/bin/python import?socket #from?StringIO?import?StringIO from?cStringIO?import?StringIO s=socket.socket() s.connect(('localhost',2181)) s.send('mntr') data_mntr=s.recv(2048) s.close() #print?data_mntr h=StringIO(data_mntr) result={} zresult={} for?line?in??h.readlines():key,value=map(str.strip,line.split('\t'))zkey='zookeeper.status'?+?'['?+?key?+?']'zvalue=valueresult[key]=valuezresult[zkey]=zvalue print?result print?'\n\n' print?zresult#?python?test.py? {'zk_outstanding_requests':?'0',?'zk_approximate_data_size':?'6666471',?'zk_max_latency':?'0',?'zk_avg_latency':?'0',?'zk_version':?'3.4.6-1569965,?built?on?02/20/2014?09:09?GMT',?'zk_watch_count':?'0',?'zk_num_alive_connections':?'1',?'zk_open_file_descriptor_count':?'27',?'zk_server_state':?'follower',?'zk_packets_sent':?'542',?'zk_packets_received':?'543',?'zk_min_latency':?'0',?'zk_ephemerals_count':?'1',?'zk_znode_count':?'17159',?'zk_max_file_descriptor_count':?'102400'}{'zookeeper.status[zk_watch_count]':?'0',?'zookeeper.status[zk_avg_latency]':?'0',?'zookeeper.status[zk_max_latency]':?'0',?'zookeeper.status[zk_approximate_data_size]':?'6666471',?'zookeeper.status[zk_server_state]':?'follower',?'zookeeper.status[zk_num_alive_connections]':?'1',?'zookeeper.status[zk_min_latency]':?'0',?'zookeeper.status[zk_outstanding_requests]':?'0',?'zookeeper.status[zk_packets_received]':?'543',?'zookeeper.status[zk_ephemerals_count]':?'1',?'zookeeper.status[zk_znode_count]':?'17159',?'zookeeper.status[zk_packets_sent]':?'542',?'zookeeper.status[zk_open_file_descriptor_count]':?'27',?'zookeeper.status[zk_max_file_descriptor_count]':?'102400',?'zookeeper.status[zk_version]':?'3.4.6-1569965,?built?on?02/20/2014?09:09?GMT'}詳細代碼如下:
#!/usr/bin/python"""?Check?Zookeeper?Clusterzookeeper?version?should?be?newer?than?3.4.x#?echo?mntr|nc?127.0.0.1?2181 zk_version 3.4.6-1569965,?built?on?02/20/2014?09:09?GMT zk_avg_latency 0 zk_max_latency 4 zk_min_latency 0 zk_packets_received 84467 zk_packets_sent 84466 zk_num_alive_connections 3 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 17159 zk_watch_count 2 zk_ephemerals_count 1 zk_approximate_data_size 6666471 zk_open_file_descriptor_count 29 zk_max_file_descriptor_count 102400#?echo?ruok|nc?127.0.0.1?2181 imok"""import?sys import?socket import?re import?subprocess from?StringIO?import?StringIO import?oszabbix_sender?=?'/opt/app/zabbix/sbin/zabbix_sender' zabbix_conf?=?'/opt/app/zabbix/conf/zabbix_agentd.conf' send_to_zabbix?=?1#############?get?zookeeper?server?status class?ZooKeeperServer(object):def?__init__(self,?host='localhost',?port='2181',?timeout=1):self._address?=?(host,?int(port))self._timeout?=?timeoutself._result??=?{}def?_create_socket(self):return?socket.socket()def?_send_cmd(self,?cmd):"""?Send?a?4letter?word?command?to?the?server?"""s?=?self._create_socket()s.settimeout(self._timeout)s.connect(self._address)s.send(cmd)data?=?s.recv(2048)s.close()return?datadef?get_stats(self):"""?Get?ZooKeeper?server?stats?as?a?map?"""data_mntr?=?self._send_cmd('mntr')data_ruok?=?self._send_cmd('ruok')if?data_mntr:result_mntr?=?self._parse(data_mntr)if?data_ruok:result_ruok?=?self._parse_ruok(data_ruok)self._result?=?dict(result_mntr.items()?+?result_ruok.items())if?not?self._result.has_key('zk_followers')?and?not?self._result.has_key('zk_synced_followers')?and?not?self._result.has_key('zk_pending_syncs'):#####?the?tree?metrics?only?exposed?on?leader?role?zookeeper?server,?we?just?set?the?followers'?to?0leader_only?=?{'zk_followers':0,'zk_synced_followers':0,'zk_pending_syncs':0}????self._result?=?dict(result_mntr.items()?+?result_ruok.items()?+?leader_only.items()?)return?self._result??def?_parse(self,?data):"""?Parse?the?output?from?the?'mntr'?4letter?word?command?"""h?=?StringIO(data)result?=?{}for?line?in?h.readlines():try:key,?value?=?self._parse_line(line)result[key]?=?valueexcept?ValueError:pass?#?ignore?broken?linesreturn?resultdef?_parse_ruok(self,?data):"""?Parse?the?output?from?the?'ruok'?4letter?word?command?"""h?=?StringIO(data)result?=?{}ruok?=?h.readline()if?ruok:result['zk_server_ruok']?=?ruokreturn?resultdef?_parse_line(self,?line):try:key,?value?=?map(str.strip,?line.split('\t'))except?ValueError:raise?ValueError('Found?invalid?line:?%s'?%?line)if?not?key:raise?ValueError('The?key?is?mandatory?and?should?not?be?empty')try:value?=?int(value)except?(TypeError,?ValueError):passreturn?key,?valuedef?get_pid(self): #??ps?-ef|grep?java|grep?zookeeper|awk?'{print?$2}'pidarg?=?'''ps?-ef|grep?java|grep?zookeeper|grep?-v?grep|awk?'{print?$2}'?'''?pidout?=?subprocess.Popen(pidarg,shell=True,stdout=subprocess.PIPE)pid?=?pidout.stdout.readline().strip('\n')return?piddef?send_to_zabbix(self,?metric):key?=?"zookeeper.status["?+??metric?+?"]"if?send_to_zabbix?>?0:#print?key?+?":"?+?str(self._result[metric])try:subprocess.call([zabbix_sender,?"-c",?zabbix_conf,?"-k",?key,?"-o",?str(self._result[metric])?],?stdout=FNULL,?stderr=FNULL,?shell=False)except?OSError,?detail:print?"Something?went?wrong?while?exectuting?zabbix_sender?:?",?detailelse:print?"Simulation:?the?following?command?would?be?execucted?:\n",?zabbix_sender,?"-c",?zabbix_conf,?"-k",?key,?"-o",?self._result[metric],?"\n"def?usage():"""Display?program?usage"""print?"\nUsage?:?",?sys.argv[0],?"?alive|all"print?"Modes?:?\n\talive?:?Return?pid?of?running?zookeeper\n\tall?:?Send?zookeeper?stats?as?well"sys.exit(1)accepted_modes?=?['alive',?'all']if?len(sys.argv)?==?2?and?sys.argv[1]?in?accepted_modes:mode?=?sys.argv[1] else:usage()zk?=?ZooKeeperServer() #??print?zk.get_stats() pid?=?zk.get_pid()if?pid?!=?""?and??mode?==?'all':zk.get_stats()#?print?zk._resultFNULL?=?open(os.devnull,?'w')for?key?in?zk._result:zk.send_to_zabbix(key)FNULL.close()print?pidelif?pid?!=?""?and?mode?==?"alive":print?pid else:print?0zabbix配置文件check_zookeeper.conf
UserParameter=zookeeper.status[*],/usr/bin/python?/opt/app/zabbix/sbin/check_zookeeper.py?$1重新啟動zabbix agent服務
四 制作Zabbix監控ZooKeeper的模板并設置報警閥值
模板參見附件
參考文檔:
https://blog.serverdensity.com/how-to-monitor-zookeeper/
https://github.com/apache/zookeeper/tree/trunk/src/contrib/monitoring
http://john88wang.blog.51cto.com/2165294/1708302
轉載于:https://blog.51cto.com/john88wang/1745339
總結
以上是生活随笔為你收集整理的使用Zabbix监控ZooKeeper服务的健康状态的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 遍历map几种方式及应用
- 下一篇: hdu 模拟 贪心 4550