Redis进阶 - 因异常断电导致的Redis Cluster Fail故障处理
文章目錄
- Pre
- 現象
- 查找未指派的slots
- 方式一 cluster slots
- 方式二 cluster nodes
- 計算未指派的slots ,重新添加
- Redisson 初始化失敗 (Not all slots are covered! Only 10923 slots are avaliable + Failed to add master: redis://172.168.15.101:7002 for slot ranges: [[10923-16383]]. Reason - cluster_state:fail)
Pre
測試環境,搭建的偽集群
101 : 7001 7002 7003 三個節點
102 : 7004 7005 7006 三個節點
機房異常斷電,主機宕機~
現象
Redis Cluster 不可用 ,應用無法正常啟動
查看集群信息 ,如下
172.168.15.101:7001> CLUSTER INFO cluster_state:fail cluster_slots_assigned:16354 cluster_slots_ok:16354 cluster_slots_pfail:0 cluster_slots_fail:0 cluster_known_nodes:6 cluster_size:3 cluster_current_epoch:7 cluster_my_epoch:1 cluster_stats_messages_ping_sent:1666 cluster_stats_messages_pong_sent:1063 cluster_stats_messages_sent:2729 cluster_stats_messages_ping_received:1063 cluster_stats_messages_pong_received:1026 cluster_stats_messages_received:2089劃重點 cluster_state:fail cluster_slots_assigned:16354 , 集群狀態 fail , 分配的slots 16354 < 16384 , 少了30個slots ,集群不可用。
為了保證集群完整性, 默認情況下當集群16384個槽任何一個沒有指派到節點時整個集群不可用。這是對集群完整性的一種保護措施, 保證所有的槽都指派給在線的節點。
可以看到 slot 有未分配的情況, 那如何重新分配這些slots 便是解決問題的關鍵。
查找未指派的slots
方式一 cluster slots
172.168.15.101:7001> CLUSTER SLOTS1) 1) (integer) 54612) (integer) 55913) 1) "172.168.15.101"...........33) 1) (integer) 02) (integer) 54603) 1) "172.168.15.101"2) (integer) 70013) "40b3ab3eb00e0107ea702e96231694016fb5c25f"4) 1) "172.168.15.102"2) (integer) 70063) "b2392a54bc1ed255d9f86ce5315b3c66177bc54c" 172.168.15.101:7001>太多了,并且這么看也不方便統計,推薦第二種方式
方式二 cluster nodes
172.168.15.101:7001> cluster nodes f434df4b2a8e8262e91b192fdd4329ac7eaba257 172.168.15.101:7003@17003 master - 0 1589854185127 7 connected 5461-5591 5593-5783 5785-5913 5915-6157 6159-6264 6266-6290 6292-6311 6313-6401 6403-6963 6965-7228 7230-7566 7568-7647 7649-7862 7864-8199 8201-8693 8695-8805 8807-8832 8834-9229 9231-9305 9307-9353 9355-9477 9479-9696 9698-9761 9763-9855 9857-10241 10243-10265 10267-10310 10312-10348 10350-10529 10531-10669 10671-10922 8c27d256907bd17ceed4b0bfc8474eb90e7cf71e 172.168.15.102:7004@17004 slave f434df4b2a8e8262e91b192fdd4329ac7eaba257 0 1589854187127 7 connected 8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d 172.168.15.101:7002@17002 master - 0 1589854186127 2 connected 10923-16383 b2392a54bc1ed255d9f86ce5315b3c66177bc54c 172.168.15.102:7006@17006 slave 40b3ab3eb00e0107ea702e96231694016fb5c25f 0 1589854185000 6 connected 40b3ab3eb00e0107ea702e96231694016fb5c25f 172.168.15.101:7001@17001 myself,master - 0 1589854184000 1 connected 0-5460 [5592-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [5784-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [5914-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6158-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6265-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6291-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6312-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6402-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [6964-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [7229-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [7567-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [7648-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [7863-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [8200-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [8694-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [8806-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [8833-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9230-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9306-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9354-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9478-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9697-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9762-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [9856-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10242-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10266-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10311-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10349-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10530-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10670-<-8c27d256907bd17ceed4b0bfc8474eb90e7cf71e] [10973-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11020-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11140-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11144-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11200-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11624-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [11802-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [12201-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [12301-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [12681-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [12685-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [13365-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [13676-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [13969-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [13989-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [14395-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [14412-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15149-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15611-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15654-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15758-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15778-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [15899-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [16100-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [16105-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] [16147-<-8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d] 6d8f2f251fa2d881cae91012088e1d5eb653ebb4 172.168.15.102:7005@17005 slave 8dff9fa8b74dd6cdf90a706c3945fbe2025cb57d 0 1589854186000 5 connected7002 : 10923-16383
7001: 0-5460
7003 : 5461-5591 5593-5783 5785-5913 5915-6157 6159-6264 6266-6290 6292-6311 6313-6401 6403-6963 6965-7228 7230-7566 7568-7647 7649-7862 7864-8199 8201-8693 8695-8805 8807-8832 8834-9229 9231-9305 9307-9353 9355-9477 9479-9696 9698-9761 9763-9855 9857-10241 10243-10265 10267-10310 10312-10348 10350-10529 10531-10669 10671-10922
缺哪些slot ,可以知道了吧
cluster nodes的格式 隨后分析一下 ~~~
計算未指派的slots ,重新添加
看7003 這個master 后面的slot分布情況
5461-5591 5593-5783 5785-5913 5915-6157 6159-6264 6266-6290 6292-6311 6313-6401 6403-6963 6965-7228 7230-7566 7568-7647 7649-7862 7864-8199 8201-8693 8695-8805 8807-8832 8834-9229 9231-9305 9307-9353 9355-9477 9479-9696 9698-9761 9763-9855 9857-10241 10243-10265 10267-10310 10312-10348 10350-10529 10531-10669 10671-10922缺少 5592 5784 5914 6158 6265 6291 6312 6402 6964 7229 7567 7648 7863 8200 8694 8806 8833 9230 9306 9354 9478 9697 9762 9856 10242 10266 10311 10349 10530 10670
重新分配下
172.168.15.101:7001> CLUSTER ADDSLOTS 5592 5784 5914 6158 6265 6291 6312 6402 6964 7229 7567 7648 7863 8200 8694 8806 8833 9230 9306 9354 9478 9697 9762 9856 10242 10266 10311 10349 10530 10670 OK 172.168.15.101:7001>過一會兒,重新查看下
172.168.15.101:7001> CLUSTER INFO cluster_state:ok cluster_slots_assigned:16384 cluster_slots_ok:16384 cluster_slots_pfail:0 cluster_slots_fail:0 cluster_known_nodes:6 cluster_size:3 cluster_current_epoch:7 cluster_my_epoch:1 cluster_stats_messages_ping_sent:2108 cluster_stats_messages_pong_sent:1508 cluster_stats_messages_sent:3616 cluster_stats_messages_ping_received:1508 cluster_stats_messages_pong_received:1468 cluster_stats_messages_update_received:19 cluster_stats_messages_received:2995 172.168.15.101:7001>OK了
Redisson 初始化失敗 (Not all slots are covered! Only 10923 slots are avaliable + Failed to add master: redis://172.168.15.101:7002 for slot ranges: [[10923-16383]]. Reason - cluster_state:fail)
Redisson配置了集群地址
[2020-05-19 10:44:33,539] INFO [localhost-startStop-1] RedissonManager.<clinit>(27) | redisson client begin to init.... [2020-05-19 10:44:36,365] ERROR [localhost-startStop-1] RedissonManager.<clinit>(52) | org.redisson.client.RedisConnectionException: Not all slots are covered! Only 10923 slots are avaliableat org.redisson.cluster.ClusterConnectionManager.<init>(ClusterConnectionManager.java:167)at org.redisson.config.ConfigSupport.createConnectionManager(ConfigSupport.java:198)at org.redisson.Redisson.<init>(Redisson.java:122)at org.redisson.Redisson.create(Redisson.java:159)..................... Caused by: org.redisson.client.RedisException: Failed to add master: redis://172.168.15.101:7002 for slot ranges: [[10923-16383]]. Reason - cluster_state:failat org.redisson.cluster.ClusterConnectionManager$1$1.operationComplete(ClusterConnectionManager.java:223)at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)原因很明確了 redis://172.168.15.101:7002 for slot ranges: [[10923-16383]]. Reason - cluster_state:fail
連上7002端口 (一定要上7002上看,不要再其他端口查看節點信息),重復剛才的操作 。
期間重啟了幾次節點 ,故障恢復 。
總結
以上是生活随笔為你收集整理的Redis进阶 - 因异常断电导致的Redis Cluster Fail故障处理的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Java 8 - 07 复合 Lambd
- 下一篇: Redis进阶 -CLUSTER NOD