當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

理解zookeeper的一致性及缺点

發布時間：2024/2/28 编程问答 41 豆豆

生活随笔收集整理的這篇文章主要介紹了理解zookeeper的一致性及缺点小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

理解zookeeper的一致性及缺點

- zookeeper保證
- - 理解zookeeper的順序一致性
- zookeeper的缺點
- 參考鏈接

zookeeper使用ZAB協議達到了極高的一致性。所以在互聯網業務中它經常被選作注冊中心、配置中心、注冊分布式鎖等。

zookeeper保證

根據zookeeper官方文檔，zookeeper提供了如下保證：

Sequential Consistency - Updates from a client will be applied in the order that they were sent.
Atomicity - Updates either succeed or fail. No partial results.
Single System Image - A client will see the same view of the service regardless of the server that it connects to. i.e., a client will never see an older view of the system even if the client fails over to a different server with the same session. 如果client首先看到了新數據，再嘗試重連到存有舊數據的follower，該follower會拒絕該連接（client的zxid高于follower）
Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.

根據我的實踐，認為zookeeper只是一個最終一致性的分布式系統，并且歷史上zookeeper還經常爆出違反分布式共識的bug，比如expired ephemeral node reappears after ZK leader change這個，session expired之后，臨時節點仍然存在

理解zookeeper的順序一致性

ZooKeeper Programmer’s Guide提到：

Sometimes developers mistakenly assume one other guarantee that ZooKeeper does not in fact make. This is:
Simultaneously Conistent Cross-Client Views
ZooKeeper does not guarantee that at every instance in time, two different clients will have identical views of ZooKeeper data. Due to factors like network delays, one client may perform an update before another client gets notified of the change. Consider the scenario of two clients, A and B. If client A sets the value of a znode /a from 0 to 1, then tells client B to read /a, client B may read the old value of 0, depending on which server it is connected to. If it is important that Client A and Client B read the same value, Client B should should call the sync() method from the ZooKeeper API method before it performs its read.
So, ZooKeeper by itself doesn’t guarantee that changes occur synchronously across all servers, but ZooKeeper primitives can be used to construct higher level functions that provide useful client synchronization.

就是說zookeeper并不保證每次從其一個server讀到的值是最新的，它只保證這個server中的值是順序更新的，如果其他server節點想要讀取最新的值，必須在get之前調用sync()(zoo_async)

zookeeper的缺點

Zab協議自身的限制導致了zookeeper的很多瓶頸，比如，單leader瓶頸，切主時服務不可用、系統存儲的內容有限，可擴展性不足等等。

另外zookeeper集群的一致性模型也并沒有想象中完美，不提一些違背一致性的bug如ZOOKEEPER-2919，其本身的機制：更新操作都要forward給leader，讀操作follower節點可以獨立進行，就決定了zookeeper的一致性保證只能做到“Updates from a client will be applied in the order that they were sent”

身為一個分布式系統，本身就免不了有許多bug。有很多論文調查、研究分布式系統歷史上出現的各種bug，我列舉了幾篇放在參考鏈接中

zookeeper本身限制也導致了客戶端的訪問方式、處理事件的方式等等處處掣肘，客戶端不管其上層承載的業務模型是怎樣的，都要按照zookeeper的filesystem/trigger API去操作。

著名的zookeeper客戶端庫Curator專門總結了使用Zookeeper的Tech notes，我選擇一些重要的翻譯如下：

所有的watcher事件都應該在同一個線程里執行，然后再這個線程里對訪問的資源加鎖（這個操作應該由zk庫在zk線程里自己完成）

認真對待session生命周期，如果expired就需要重連，如果session已經expired了，所有與這個session相關的操作也應該失敗。session和臨時節點是綁定的，session expired了臨時節點也就沒了

zookeeper不適合做消息隊列，因為

zookeeper有1M的消息大小限制
zookeeper的children太多會極大的影響性能
znode太大也會影響性能
znode太大會導致重啟zkserver耗時10-15分鐘
zookeeper僅使用內存作為存儲，所以不能存儲太多東西。

最好單線程操作zk客戶端，不要并發，臨界、競態問題太多

Curator session 生命周期管理：

CONNECTED：第一次建立連接成功時收到該事件
READONLY：標明當前連接是read-only狀態
SUSPENDED：連接目前斷開了(收到KeeperState.Disconnected事件，也就是說curator目前沒有連接到任何的zk server)，leader選舉、分布式鎖等操作遇到SUSPENED事件應該暫停上層的業務直到重連成功。Curator官方建議把SUSPENDED事件當作完全的連接斷開來處理。意思就是收到SUSPENDED事件的時候就應該當作自己注冊的所有臨時節點已經掉了。
LOST：如下幾種情況會進出LOST事件
- curator收到zkserver發來的EXPIRED事件。
- curator自己關掉當前zookeeper session
- 當curator斷定當前session被zkserver認為已經expired時設置該事件。在Curator 3.x，Curator會有自己的定時器，如果收到SUSPENDED事件一直沒有沒有收到重連成功的事件，超時一定時間（2/3 * session_timeout）。curator會認為當前session已經在server側超時，并進入LOST事件。
RECONNECTED：重連成功

對于何時進入LOST狀態，curator的建議：

When Curator receives a KeeperState.Disconnected message it changes its state to SUSPENDED (see TN12, errors, etc.). As always, our recommendation is to treat SUSPENDED as a complete connection loss. Exit all locks, leaders, etc. That said, since 3.x, Curator tries to simulate session expiration by starting an internal timer when KeeperState.Disconnected is received. If the timer expires before the connection is repaired, Curator changes its state to LOST and injects a session end into the managed ZooKeeper client connection. The duration of the timer is set to the value of the “negotiated session timeout” by calling ZooKeeper#getSessionTimeout().
The astute reader will realize that setting the timer to the full value of the session timeout may not be the correct value. This is due to the fact that the server closes the connection when 2/3 of a session have already elapsed. Thus, the server may close a session well before Curator’s timer elapses. This is further complicated by the fact that the client has no way of knowing why the connection was closed. There are at least three possible reasons for a client connection to close:

The server has not received a heartbeat within 2/3 of a session
The server crashed
Some kind of general TCP error which causes a connection to fail

In situtation 1, the correct value for Curator’s timer is 1/3 of a session - i.e. Curator should switch to LOST if the connection is not repaired within 1/3 of a session as 2/3 of the session has already lapsed from the server’s point of view. In situations 2 and 3 however, Curator’s timer should be the full value of the session (possibly plus some “slop” value). In truth, there is no way to completely emulate in the client the session timing as managed by the ZooKeeper server. So, again, our recommendation is to treat SUSPENDED as complete connection loss.

curator默認使用100%的session timeout時間作為SUSPENDED到LOST的轉換時間，但是用戶可以根據需求配置為33%的session timeout以滿足上文所說的情況的場景

可見，使用好zookeeper不是一件容易的事，筆者使用zookeeper的過程中也曾遇到以下問題：

zk session 處理

忽略了connecting事件，client與server心跳超時之后沒有將選主服務及時下線掉，導致雙主腦裂。
多個線程處理zk的連接狀態，導致產生了多套zk線程連接zkserver。
zk超時時間不合理，導致重連頻率太高，打爆zkserver。
所有的zkserver全部重置（zk server全部狀態被重置），這種情況下客戶端不會受到expired事件，我之前實現的客戶端也不會重新去建立zk session。導致之前的zkclient建立的session全部不可用，陷入無限重連而連不上的窘境。

多線程競態

zk客戶端自己的線程do_completion會調用watcher的回調函數，和業務線程產生競爭，導致core dump。

客戶端同步api

同步API沒有超時時間，如果zkserver狀態不對，發送給zkserver的rpc得不到回應，會導致調用同步zk API的線程阻塞卡死。
供業務使用的api設計不當，導致初始化時調用的同步版本api造成死鎖。

參考鏈接

What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems

TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems

An Analysis of Network-Partitioning Failures in Cloud Systems

可能是全網把 ZooKeeper 概念講的最清楚的一篇文章

Zookeeper ZNodes – Characteristics & Example

ZooKeeper Recipes and Solutions

How to do distributed locking長文，里面畫的圖不錯

ZAB協議簡介braft介紹zab

Lease與最長宕機時間分析 zk提供選主服務，導致的不可服務的時間

Zab: High-performance broadcast for primary-backup systems

zookeeper Consistency Guarantees

深入淺析zookeeper的一致性模型及其實現講解了為什么zookeeper的一致性和其他一致性協議有區別

How ZooKeeper guarantees “Single System Image”?

ZooKeeper: Wait-free coordination for Internet-scale systems yahoo的論文

ZooKeeper FAQ 官方文檔告訴你應該如何處理CONNECTION_LOSS，SESSION_EXPIRED等等，真的對zk有所了解的人都會問的問題。。。

總結

以上是生活随笔為你收集整理的理解zookeeper的一致性及缺点的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。