Why Vector Clock are Easy or Hard?
通過實(shí)際例子來闡述vector clock其實(shí)是容易理解的, easy
同樣通過實(shí)際例子來描述在使用vector clock時(shí)會(huì)遇到哪些難以解決的問題, hard
Why Vector Clocks are Easy
http://basho.com/blog/technical/2010/01/29/why-vector-clocks-are-easy/
Vector Clocks by Example
通過如下日常的例子來幫助理解vector clock, 你會(huì)發(fā)現(xiàn)其實(shí)這個(gè)算法并不復(fù)雜.
大家想計(jì)劃一下, 下周哪天一起吃飯? 大家如何達(dá)成最終的一致性?
之間看下面的圖吧, 過程很清晰
Alice, Ben, Cathy, and Dave are planning to meet next week for dinner.
The planning starts with Alice suggesting they meet on Wednesday.
Later, Dave discuss alternatives with Cathy, and they decide on Thursday instead.
Dave also exchanges email with Ben, and they decide on Tuesday.
When Alice pings everyone again to find out whether they still agree with her Wednesday suggestion, she gets mixed messages:
Cathy claims to have settled on Thursday with Dave, and Ben claims to have settled on Tuesday with Dave.
Dave can't be reached, and so no one is able to determine the order in which these communications happened, and so none of Alice, Ben, and Cathy know whether Tuesday or Thursday is the correct choice.
The story changes, but the end result is always the same: you ask two people for the latest version of a piece of information, and they reply with two different answers, and there's no way to tell which one is really the most recent.
?
Start with Alice's initial message: "Let's meet Wednesday,"
date = Wednesday vclock = Alice:1Ben suggests Tuesday:
date = Tuesday vclock = Alice:1, Ben:1Dave replies, confirming Tuesday:
date = Tuesday vclock = Alice:1, Ben:1, Dave:1
Now Cathy gets into the act, suggesting Thursday:
date = Thursday vclock = Alice:1, Cathy:1Dave has two conflicting objects:
date = Tuesday vclock = Alice:1, Ben:1, Dave:1and
date = Thursday vclock = Alice:1, Cathy:1Dave can tell that these versions are in conflict, because neither vclock "descends" from the other. Luckily, Dave's a reasonable guy, and chooses Thursday. Dave also created a vector clock that is a successor to all previously-seen vector clocks. He emails this value back to Cathy.
date = Thursday vclock = Alice:1, Ben:1, Cathy:1, Dave:2
So now when Alice asks Ben and Cathy for the latest decision, the replies she receives are, from Ben:
date = Tuesday vclock = Alice:1, Ben:1, Dave:1and from Cathy:
date = Thursday vclock = Alice:1, Ben:1, Cathy:1, Dave:2?
From this, she can tell that Dave intended his correspondence with Cathy to override the decision he made with Ben. All Alice has to do is show Ben the vector clock from Cathy's message, and Ben will know that he has been overruled.
That worked out pretty well.
通過上面的圖示, 確實(shí)體現(xiàn)出'Why Vector Clocks are Easy’
?
Why Vector Clocks Are Hard
http://basho.com/blog/technical/2010/04/05/why-vector-clocks-are-hard/
但是在實(shí)際實(shí)現(xiàn)該系統(tǒng)時(shí), 還是有很多問題需要解決的, 從實(shí)現(xiàn)的角度體現(xiàn)'Why Vector Clocks Are Hard’
最難的問題是, 以什么作為actor? 以及當(dāng)vclocks隨著時(shí)間不斷變大的時(shí)候, 怎樣保存它?
1. What an actor is (i.e. where the incrementing and resolution is, and what parties get their own field in the vector)
2. How to keep vclocks from growing without bound over time.
In this example the parties that actually proposed changes ("clients") were the actors in the vector clocks. This is the model that vector clocks are designed for and work well with, but it has a drawback. The width of the vectors will grow proportionally with the number of clients. In a group of friends deciding when to have dinner this isn't a problem, but in a distributed storage system the number of clients over time can be large. Once the vector clocks get that large, they not only take up more space in disk and RAM but also take longer to compute comparisons over.
這個(gè)就談到選取什么作為actor的問題, 上面的例子用client作為actor是很清晰, 但問題是在分布式環(huán)境中, client的數(shù)量可能是非常多的, 這個(gè)就有嚴(yán)重的效率問題.
One straightforward idea is to make the servers handling client requests be the "actors", instead of representing the clients directly. Since any given system usually has a known bounded number of servers over time and also usually has less servers than clients, this serves to reduce and cap the size of the vclocks
Let's think through the same example, but with that difference, to see how it goes. We'll assume that a 2-server distributed system is coordinating the communication, with clients evenly distributed among them.
下面的例子就嘗試使用server作為actor來避免client過多的問題
Alice and Dave happen to get server X, and Ben and Cathy get server Y.
We're fine through the first few steps:
The only real difference so far is that each update increments a vector clock field named for the client's chosen server instead of the client itself. This will mean that the number of fields needed won't grow without bound; it will be the same as the number of servers. This is the desired effect of the change.
We run into trouble, though, when Cathy sends her update:
?
In the original example, this is where a conflict was created. Dave sorted out the conflict, and everything was fine.
With our new strategy, though, something else happened. Ben and Cathy were both modifying from the same original object. Since we used their server id instead of their own name to identify the change, Cathy's message has the same vector clock as Ben's! This means that Dave's message (responding to Ben) appears to be a simple successor to Cathy's... and we lose her data silently!
用server作為actor的問題就是, 會(huì)丟失一些數(shù)據(jù), 因?yàn)镃athy和Ben都操作Y, 所以對(duì)于系統(tǒng)而已, Cathy和Ben沒有區(qū)別都是Y, 所以當(dāng)Dave confirm Ben后, vector為X2, Y1, 這個(gè)時(shí)候再收到Cathy的X1, Y1時(shí), 就會(huì)認(rèn)為是過期數(shù)據(jù), 自動(dòng)丟掉. 這樣就無法解決上面的協(xié)調(diào)這個(gè)問題, 所以這個(gè)簡單solution是不可行的.
發(fā)生這個(gè)問題的原因如下,
For vector clocks to have their desired effect without causing accidents such as this, the elements represented by the fields in the vclock must be the real units of concurrency. In a case like this little example or a distributed storage system, that means client identifiers, not server-based ones.
server不是真正的并發(fā)單元, 而你把他作為actor, 其實(shí)是不合理的, 所以就會(huì)有現(xiàn)在這個(gè)問題
?
Just Lose a Little Information and Everything Will Be Fine
If we use client identifiers, we're back in the situation where vector clocks will grow and grow as more clients use a system over time.
The solution most people end up with is to "prune" their vector clocks as they grow.
This is done by adding a timestamp to each field, and updating it to the current local time whenever that field is incremented. This timestamp is never used for vclock comparison -- that is purely a matter of logical time -- but is only for pruning purposes.
This way, when a given vclock gets too big, you can remove fields, starting at the one that was updated longest ago, until you hit a size/age threshold that makes sense for your application.
But, you ask, doesn't this lose information?
Yes, it does -- but it won't make you lose your data.
The only case where this kind of pruning will matter at all is when a client holds a very old copy of the unpruned vclock and submits data descended from that. This will create a sibling (conflict) even though you might have been able to resolve it automatically if you had the complete unpruned vclock at the server. That is the tradeoff with pruning: in exchange for keeping growth under control, you run the chance of occasionally having to do a "false merge"... but you never lose data quietly, which makes this approach unequivocally better than moving the field identifiers off of the real client and onto the server.
這個(gè)方法思路, 就是既然我們必須用client作為actor, 問題就是vector會(huì)越來越大, 那么可不可以解決這個(gè)問題...
那就是prune,? 根據(jù)什么去prune, timestamp, 這樣定期把比較老的client數(shù)據(jù)清掉, 就可以比較合理的解決這個(gè)問題.
這個(gè)方案是否有風(fēng)險(xiǎn)? 有一些, 不過可以接受, 所以是有價(jià)值的tradeoff, 至少我們現(xiàn)在不會(huì)lose data quietly.
轉(zhuǎn)載于:https://www.cnblogs.com/fxjwind/archive/2012/06/06/2537963.html
總結(jié)
以上是生活随笔為你收集整理的Why Vector Clock are Easy or Hard?的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 单身女性有必要购房吗?
- 下一篇: 用iptables来防止web服务器被C