Strong Consistency, 强一致性技术概述
http://horicky.blogspot.com/2009/11/nosql-patterns.html
A brief history of Consensus_ 2PC and Transaction Commit (譯) 對于一致性問題很好的綜述
2 Phase Commit(譯)
?
Master Slave (or Single Master)Model
Under this model, each data partition has a single master and multiple slaves.
All update requests has to go to the master where update is applied and then asynchronously propagated to the slaves. Notice that there is a time window of data lost if the master crashes before it propagate its update to any slaves, so some system will wait synchronously for the update to be propagated to at least one slave.
Read requests can go to any replicas if the client can tolerate some degree of data staleness. This is where the read workload is distributed among many replicas. If the client cannot tolerate staleness for certain data, it also need to go to the master.
Master Slave model works very well in general when the application has a high read/write ratio. It also works very well when the update happens evenly in the key range. So it is the predominant model of data replication.
主從模式, 最傳統和簡單的模式
寫操作, 所有寫操作通過master, master寫成功即返回, 然后master負責異步propagate到各個slave節點. 為了增強可靠性, 也可以等master至少propagate一個slave后再返回.
讀操作, 如果可以容忍舊數據, 從任一節點讀. 如果不能容忍, 所有讀操作也要通過master
缺點, 單點問題, 以及master負載過重
解決辦法, 參考Google的設計, GFS, Bigtable
?
去中心化的方案, Quorum Based 2PC
由于主從模式比較成熟和簡單
對于分布式的場景, 去中心化的設計(無固定master), 如何保證一致性? 這才是近年來, 研究的難點和熱點
JimGray在“Notes on Database Operating Systems” (1979)中描述了兩階段提交(2PC)
二階段提交(2PC)協議
傳統的2PC協議用于保證分布式事務的原子性, 分布式存放的數據, 必須要保證同時更新成功或失敗.
所以coordinator必須在第一階段, 發送prepare請求保證所有的數據復本當前都是ready for update, 在得到所有復本回應后再開始第二階段, 正真的commit
這里就比基于master復雜, 不是僅僅master同意, 而是要所有的node都同意, 才能commit
To provide "strict consistency", we can use a traditional 2PC protocol to bring all replicas to the same state at every update.
Lets say there is N replicas for a data.
When the data is update, there is a "prepare" phase where the coordinator ask every replica to confirm whether each of them is ready to perform the update.
Each of the replica will then write the data to a log file and when success, respond to the coordinator.
After gathering all replicas responses positively, the coordinator will initiate the second "commit" phase and then ask every replicas to commit.
Each replica then write another log entry to confirm the update.
Notice that there are some scalability issue as the coordinator need to "synchronously" wait for quite a lot of back and forth network roundtrip and disk I/O to complete.
On the other hand, if any one of the replica crashes, the update will be unsuccessful. As there are more replicas, chance of having one of them increases. Therefore, replication is hurting the availability rather than helping. This make traditional 2PC not a popular choice for high throughput transactional system.
2PC協議的最大的問題是沒有考慮節點fail的case, 任意的節點的fail都會導致block.
Dale Skeen在“NonBlocking Commit Protocols” (1981)中指出,對于一個分布式系統, 需要3階段的提交算法來避免2PC中的阻塞(block)問題, 但問題關鍵很難找到一個好的3PC算法
對于阻塞問題,? 其實想當然的是可用通過timeout來解決, 當然問題沒有那么簡單,
問題的核心在于,你無法區分一個進程到底是終止了還是正在以極低的速度執行,這使得在異步系統中的錯誤處理幾乎是不可能的
Fischer, Lynch 和 Paterson在"Impossibility of distributed consensus with one faulty process” (1985) 中證明了這一點
對于一個異步系統來說即使只有一個進程出錯,分布式一致性也是不可能達到的,這就是著名的FLP結論
人們意識到一個分布式算法具有兩個屬性: 安全性(safety)和活性(liveness), 2PC極具安全性,卻缺乏活性
在1986年的會議上, 分布式事務被認為是一個新的一致性問題,稱為uniform consensus (參見“Uniform consensus is harder than consensus” (2000))
With uniform consensus all processes must agree on a value, even the faulty ones - a transaction should only commit if all RMs are prepared to commit. Most forms of consensus are only concerned with having the non-faulty processes agree. Uniform consensus is more difficult than general consensus.
個人理解, 在節點或進程失效的時候, 仍然可以達成一致性, 而不會存在2PC的block的情況
Paxos, quorum based 2PC
最終Lamport在“The Part-Time Parliament” (submitted in 1990, published 1998)中提出了Paxos一致性算法, 后來Lamport又發表了“Paxos Made Simple (2001).
用于解決uniform consensus的問題.
Paxos的核心, 在于quorum based 2PC, 在分布式環境既然無法要求所有節點能夠正常響應
那么Paxos只需要majority(多數派)正常響應, 就可以達成一致性決議, 從而避免任一節點fail導致的block
但問題在于, 那些沒有響應的節點(因為fail或網絡等原因)怎樣保證其一致性?
答案是, 任何一致性決議的達成都需要majority的accept, 任意兩個majority集合都一定有交集(至少一個節點)
而任一節點都只能accept一次proposal(除非具有相同的value), 所以當一個一致性決議達成的情況下, 不可能有不同value新決議被達成(即使在部分節點fail的情況下)
從而即使fail的節點wake-up后, 仍然可以簡單的從其他majority節點learn并保證一致性
這就是為什么叫quorum based 2PC, 其實本質就是 R +W > N
并且在一段時間內無法獲得majority的響應時, 可以隨時主動放棄現有提案, 并提出更高編號的提案, 進一步避免block
傳統2PC只是Paxos的一種特殊case (當W = N and R = 1)
A more efficient way is to use the quorum based 2PC (e.g. PAXOS).
In this model, the coordinator only need to update W replicas (rather than all N replicas) synchronously. The coordinator still write to all the N replicas but only wait for positive acknowledgment for any W of the N to confirm. This is much more efficient from a probabilistic standpoint.
As you can see, the quorum based 2PC can be considered as a general 2PC protocol where the traditional 2PC is a special case where W = N and R = 1. The general quorum-based model allow us to pick W and R according to our tradeoff decisions between read and write workload ratio.
轉載于:https://www.cnblogs.com/fxjwind/archive/2013/04/03/2998396.html
總結
以上是生活随笔為你收集整理的Strong Consistency, 强一致性技术概述的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: c#设置软件开机自动运行,修改注册表
- 下一篇: 12 Useful Tips for M