【原创】大数据基础之Zookeeper(1)介绍、安装及使用
zookeeper3.4.11
http://zookeeper.apache.org/
?
一 簡介
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.?All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
zookeeper是一個中心化的服務(wù),可以用來:1 維護(hù)配置信息;2 命名服務(wù);3 提供分布式同步;4 提供分組服務(wù);
ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchal namespace which is organized similarly to a standard file system.The name space consists of data registers - called znodes, in ZooKeeper parlance - and these are similar to files and directories. Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can acheive high throughput and low latency numbers.
zookeeper提供一個共享的命名空間,這個命名空間非常像一個文件系統(tǒng)(由文件和目錄組成),命名空間由znode組成;與文件系統(tǒng)不同的是,zookeeper的數(shù)據(jù)是放在內(nèi)存的,而且zookeeper的“目錄”(znode)也可以寫數(shù)據(jù);
1 部署視圖
?
- The servers that make up the ZooKeeper service must all?know about each other. They maintain?an in-memory image of state, along with?a transaction logs?and?snapshots?in a persistent store. As long as a majority of the servers are available, the ZooKeeper service will be available.
- Clients connect to a single ZooKeeper server. The client maintains a?TCP connection?through which it?sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.
zookeeper服務(wù)中的每臺服務(wù)器都知道其他服務(wù)器的地址,并且任意兩臺服務(wù)器之間都互相連接;zookeeper服務(wù)器除了內(nèi)存數(shù)據(jù)之外,還會在外部存儲存放transaction logs和sampshots;只要zookeeper集群中的大多數(shù)服務(wù)器存活,即? (n+1)/2,zookeeper集群就可以提供服務(wù);
zookeeper客戶端只連接到其中一臺zookeeper服務(wù)器,客戶端會維護(hù)一個到服務(wù)器端的TCP長連接,通過這個連接來發(fā)送請求,接收響應(yīng),獲取watch事件,發(fā)送心跳等;如果當(dāng)前連接斷掉,客戶端會嘗試連接到另外一臺zookeeper服務(wù)器;
2 數(shù)據(jù)視圖
?
- The name space provided by ZooKeeper is?much like that of a standard file system.?A name is a sequence of path elements separated by a slash (/). Every node in ZooKeeper's name space is identified by a path.
- Unlike is standard file systems, each node in a ZooKeeper namespace can?have data?associated with it as well as children. It is like having a file-system that allows a file to also be a directory.?
- ZooKeeper also has the notion of?ephemeral nodes. These znodes exists as long as the session that created the znode is active. When the session ends the znode is deleted.?
- ZooKeeper supports the concept of?watches. Clients can set a watch on a znodes.?A watch will be triggered and removed when the znode changes. When a watch is triggered the client receives a packet saying that the znode has changed. And if the connection between the client and one of the Zoo Keeper servers is broken, the client will receive a local notification.
zookeeper提供的命名空間非常像文件系統(tǒng),有幾點區(qū)別:1 “目錄”也可以存放數(shù)據(jù)(zookeeper中沒有目錄和文件的概念,所有的節(jié)點都是znode,即可以存放數(shù)據(jù)又可以有子節(jié)點);2 支持Ephemeral節(jié)點即臨時節(jié)點?,臨時節(jié)點的生命周期與客戶端的session保持一致;3 支持watch,即訂閱,當(dāng)znode變化時對應(yīng)的watch事件會被觸發(fā);
3 數(shù)據(jù)更新過程
?
- Every ZooKeeper server services clients. Clients connect to exactly one server to submit irequests.?Read requests?are serviced from the local replica of each server database. Requests that change the state of the service, write requests, are processed by an agreement protocol.
- As part of the agreement protocol?all write requests?from clients are forwarded to a single server, called the?leader. The rest of the ZooKeeper servers, called?followers, receive message proposals from the leader and agree upon message delivery.?The messaging layer takes care of replacing leaders on failures and syncing followers with leaders.
每一個zookeeper服務(wù)器都會響應(yīng)客戶端連接,每個客戶端只連接到一臺服務(wù)器,客戶端的讀請求直接由當(dāng)前連接的服務(wù)器查詢本地數(shù)據(jù)來響應(yīng),客戶端的寫請求都會被重定向到作為leader的服務(wù)器處理,除了leader之外的服務(wù)器稱為follower(另外還有一種observer后邊會講到);
4 Session狀態(tài)機
?
- A ZooKeeper client establishes a session with the ZooKeeper service by creating a handle to the service using a language binding. Once created, the handle starts of in the CONNECTING state and the client library tries to connect to one of the servers that make up the ZooKeeper service at which point it switches to the CONNECTED state. During normal operation will be in one of these two states. If an unrecoverable error occurs, such as session expiration or authentication failure, or if the application explicitly closes the handle, the handle will move to the CLOSED state.?
客戶端建立連接時首先會進(jìn)入CONNECTING 狀態(tài),連接成功之后進(jìn)入CONNECTED狀態(tài),當(dāng)連接關(guān)閉之后會進(jìn)入CLOSED狀態(tài);
5 Guarantees
-
Sequential Consistency?- Updates from a client will be applied in the order that they were sent.
-
Atomicity?- Updates either succeed or fail. No partial results.
-
Single System Image?- A client will see the same view of the service regardless of the server that it connects to.
-
Reliability?- Once an update has been applied, it will persist from that time forward until a client overwrites the update.
-
Timeliness?- The clients view of the system is guaranteed to be up-to-date within a certain time bound.
?保證:串行一致性;原子性;持久性;實時性;
二 安裝
1 配置
$ZOOKEEPER_HOME/conf/zoo.cfg
tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888tickTime the basic time unit in milliseconds used by ZooKeeper. It is used to do heartbeats and the minimum session timeout will be twice the tickTime.
dataDir the location to store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
clientPort the port to listen for client connections
initLimit?is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader.?
syncLimit?limits how far out of date a server can be from a leader.
zookeeper核心的幾個配置就是以上5個;
?
The entries of the form?server.X?list the servers that make up the ZooKeeper service. When the server starts up, it knows which server it is by looking for the file?myid?in the data directory. That file has the contains the server number, in ASCII.
Peers use the former port to connect to other peers. Such a connection is necessary so that peers can?communicate, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader arises, a follower opens a TCP connection to the leader using this port. Because the default leader election also uses TCP, we currently require another port for?leader election. This is the second port in the server entry.
?zookeeper集群中所有服務(wù)器都配置在server.X中,當(dāng)一臺服務(wù)器啟動時,他首先會到數(shù)據(jù)目錄中找到myid文件,找到自己的id,然后會同集群中的其他服務(wù)器建立連接,這里的連接有兩個,一個用于選舉,一個用于通訊;
?
文件細(xì)節(jié):
The?snapshot?files?stored in the data directory are fuzzy snapshots in the sense that during the time the ZooKeeper server is taking the snapshot, updates are occurring to the data tree.?The suffix of the?snapshot?file names is the?zxid, the ZooKeeper transaction id, of the last committed transaction at the start of the snapshot. Thus, the snapshot includes a subset of the updates to the data tree that occurred while the snapshot was in process. The snapshot, then, may not correspond to any data tree that actually existed, and for this reason we refer to it as a fuzzy snapshot. Still, ZooKeeper can recover using this snapshot because it takes advantage of the idempotent nature of its updates. By replaying the transaction log against fuzzy snapshots ZooKeeper gets the state of the system at the end of the log.
The?Log Directory?contains the ZooKeeper transaction logs. Before any update takes place, ZooKeeper ensures that the transaction that represents the update is written to non-volatile storage. A new log file is started when the number of transactions written to the current log file reaches a (variable) threshold. The threshold is computed using the same parameter which influences the frequency of snapshotting (see snapCount above).?The log file's suffix is the first zxid written to that log.
zookeeper數(shù)據(jù)目錄中有兩個文件,一個是snapshot,即快照文件,一個是transaction log,即日志文件,zookeeper會將所有的寫操作全部記錄到日志文件中,同時會定期將內(nèi)存中的數(shù)據(jù)寫到快照文件中,通過這兩個文件,zookeeper可以保證數(shù)據(jù)不丟失同時重啟后可以快速恢復(fù)數(shù)據(jù);
snapshot文件命名為snapshot.$last_zxid,后綴為快照文件中最后一條日志的id,比如snapshot.3e0d57d547
transaction log文件命名為:log.$first_zxid,后綴為日志文件中第一條日志的id,比如log.3e0d4bffd7
?
2 啟動
bin/zkServer.sh start進(jìn)程
hadoop? ?150952? 4.9? 1.6 22934232 1103420 ?? ? Sl? ?Mar14 11954:12 /$JAVA_HOME/bin/java -Dzookeeper.log.dir=/$ZOOKEEPER_HOME -Dzookeeper.root.logger=INFO,CONSOLE -cp /$ZOOKEEPER_HOME/bin/../build/classes:/$ZOOKEEPER_HOME/bin/../build/lib/*.jar:/$ZOOKEEPER_HOME/bin/../lib/slf4j-log4j12-1.6.1.jar:/$ZOOKEEPER_HOME/bin/../lib/slf4j-api-1.6.1.jar:/$ZOOKEEPER_HOME/bin/../lib/netty-3.7.0.Final.jar:/$ZOOKEEPER_HOME/bin/../lib/log4j-1.2.16.jar:/$ZOOKEEPER_HOME/bin/../lib/jline-0.9.94.jar:/$ZOOKEEPER_HOME/bin/../zookeeper-3.4.6.jar:/$ZOOKEEPER_HOME/bin/../src/java/lib/*.jar:/$ZOOKEEPER_HOME/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /$ZOOKEEPER_HOME/bin/../conf/zoo.cfg
?
數(shù)據(jù)目錄結(jié)構(gòu)
[user@zk_server_1 version-2]$ ls /$ZOOKEEPER_DATA_DIR/
myid version-2 zookeeper_server.pid
[user@zk_server_1 version-2]$ ls /$ZOOKEEPER_DATA_DIR/version-2/
acceptedEpoch log.3e0d4bffd7 log.3e0d57d549 log.3e0d6255f7 log.3e0d6df7e9 log.3e0d790349 snapshot.3e0d4bffd5 snapshot.3e0d57d547 snapshot.3e0d6255f2 snapshot.3e0d6df7e7 snapshot.3e0d790347
currentEpoch log.3e0d4d1a04 log.3e0d58a803 log.3e0d63629b log.3e0d6f0d5c log.3e0d7a4523 snapshot.3e0d4d1a02 snapshot.3e0d58a801 snapshot.3e0d636299 snapshot.3e0d6f0d5a snapshot.3e0d7a4521
log.3e0d430aed log.3e0d4e724a log.3e0d59a04c log.3e0d64cc1f log.3e0d6ff7ba log.3e0d7bc17d snapshot.3e0d4e7248 snapshot.3e0d59a04a snapshot.3e0d64cc1d snapshot.3e0d6ff7b8 snapshot.3e0d7bc17b
log.3e0d43dadd log.3e0d4ff8ce log.3e0d5ad0a0 log.3e0d65fd11 log.3e0d714d93 snapshot.3e0d43dadb snapshot.3e0d4ff8cc snapshot.3e0d5ad09d snapshot.3e0d65fd0f snapshot.3e0d714d91
log.3e0d454827 log.3e0d5150a7 log.3e0d5bb95f log.3e0d67015a log.3e0d724f8d snapshot.3e0d454825 snapshot.3e0d5150a5 snapshot.3e0d5bb95b snapshot.3e0d670158 snapshot.3e0d724f8b
log.3e0d465a2e log.3e0d52156c log.3e0d5ce712 log.3e0d67fa44 log.3e0d731ca5 snapshot.3e0d465a2c snapshot.3e0d52156a snapshot.3e0d5ce710 snapshot.3e0d67fa42 snapshot.3e0d731ca3
log.3e0d47ac00 log.3e0d531018 log.3e0d5dae9f log.3e0d6932ab log.3e0d748b2d snapshot.3e0d47abfe snapshot.3e0d531016 snapshot.3e0d5dae9d snapshot.3e0d6932ad snapshot.3e0d748b2b
log.3e0d48746a log.3e0d548417 log.3e0d5f18d5 log.3e0d6a0150 log.3e0d75de64 snapshot.3e0d487468 snapshot.3e0d548415 snapshot.3e0d5f18d8 snapshot.3e0d6a014e snapshot.3e0d75de62
log.3e0d49eff7 log.3e0d55620a log.3e0d5fdcc3 log.3e0d6b5212 log.3e0d76e90e snapshot.3e0d49eff5 snapshot.3e0d556208 snapshot.3e0d5fdcc1 snapshot.3e0d6b5210 snapshot.3e0d76e90c
log.3e0d4af170 log.3e0d56bd2c log.3e0d60d48a log.3e0d6cc57c log.3e0d780e44 snapshot.3e0d4af16e snapshot.3e0d56bd2a snapshot.3e0d60d485 snapshot.3e0d6cc57a snapshot.3e0d780e42
三 使用
1?ZooKeeper Commands: The Four Letter Words
$ echo mntr | nc localhost 2185- conf
- Print details about serving configuration.
- cons
- List full connection/session details for all clients connected to this server. Includes information on numbers of packets received/sent, session id, operation latencies, last operation performed, etc...
- crst
- Reset connection/session statistics for all connections.
- dump
- Lists the outstanding sessions and ephemeral nodes. This only works on the leader.
- envi
- Print details about serving environment
- ruok
- Tests if server is running in a non-error state. The server will respond with imok if it is running. Otherwise it will not respond at all.
- srst
- Reset server statistics.
- srvr
- Lists full details for the server.
- stat
- Lists brief details for the server and connected clients.
- wchs
- Lists brief information on watches for the server.
- wchc
- Lists detailed information on watches for the server, by session. This outputs a list of sessions(connections) with associated watches (paths). Note, depending on the number of watches this operation may be expensive (ie impact server performance), use it carefully.
- wchp
- Lists detailed information on watches for the server, by path. This outputs a list of paths (znodes) with associated sessions. Note, depending on the number of watches this operation may be expensive (ie impact server performance), use it carefully.
- mntr
- Outputs a list of variables that could be used for monitoring the health of the cluster.
?
示例:
-bash-4.1$ echo stat|nc $zk_server_1 2181?
Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
Clients:
/zk_client_1:59407[1](queued=0,recved=55,sent=55)
/zk_client_2:40094[0](queued=0,recved=1,sent=0)
/zk_client_3:60926[1](queued=0,recved=115,sent=115)
/zk_client_4:59288[1](queued=0,recved=56,sent=56)
/zk_client_5:14155[1](queued=0,recved=115,sent=115)
/zk_client_6:18602[1](queued=0,recved=115,sent=115)
Latency min/avg/max: 0/0/146
Received: 6294
Sent: 6376
Connections: 6
Outstanding: 0
Zxid: 0x3e0d1cccd4
Mode: follower
Node count: 26394
?
Zxid?Every change to the ZooKeeper state receives a stamp in the form of a?zxid?(ZooKeeper Transaction Id). This exposes the total ordering of all changes to ZooKeeper. Each change will have a unique zxid and if zxid1 is smaller than zxid2 then zxid1 happened before zxid2.
The ZooKeeper service can be monitored in one of two primary ways; 1) the command port through the use of?4 letter words?and 2)?JMX.?
zookeeper中數(shù)據(jù)的任何變化都有一個唯一的zxid,并且是有序的,即先發(fā)生的變化的zxid一定小于后發(fā)生的變化的zxid;
zookeeper可以通過兩種方式監(jiān)控:4字命令 和 jmx;
2 命令行客戶端
$ bin/zkCli.sh -server 127.0.0.1:2181 [zkshell: 0] help ZooKeeper host:port cmd argsget path [watch]ls path [watch]set path data [version]delquota [-n|-b] pathquitprintwatches on|offcreatepath data aclstat path [watch]listquota pathhistorysetAcl path aclgetAcl pathsync pathredo cmdnoaddauth scheme authdelete path [version]setquota -n|-b val path?
示例:
[zkshell: 12] get /zk_test my_data cZxid = 5 ctime = Fri Jun 05 13:57:06 PDT 2009 mZxid = 5 mtime = Fri Jun 05 13:57:06 PDT 2009 pZxid = 5 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0 dataLength = 7 numChildren = 0-
czxid
The zxid of the change that caused this znode to be created.
-
mzxid
The zxid of the change that last modified this znode.
-
pzxid
The zxid of the change that last modified children of this znode.
-
ctime
The time in milliseconds from epoch when this znode was created.
-
mtime
The time in milliseconds from epoch when this znode was last modified.
-
version
The number of changes to the data of this znode.
-
cversion
The number of changes to the children of this znode.
-
aversion
The number of changes to the ACL of this znode.
-
ephemeralOwner
The session id of the owner of this znode if the znode is an ephemeral node. If it is not an ephemeral node, it will be zero.
-
dataLength
The length of the data field of this znode.
-
numChildren
The number of children of this znode.
znode節(jié)點的各個屬性如上;?
3 java客戶端
org.apache.zookeeper.ZooKeeper
- This is the main class of ZooKeeper client library. To use a ZooKeeper service, an application must first instantiate an object of ZooKeeper class. All the iterations will be done by calling the methods of ZooKeeper class. The methods of this class are thread-safe unless otherwise noted.
- Once a connection to a server is established, a?session ID?is assigned to the client. The client will send?heart beats?to the server periodically to keep the session valid.
- The application can call ZooKeeper APIs through a client as long as the session ID of the client remains valid.
- If for some reason, the client fails to send heart beats to the server for a prolonged period of time (exceeding the sessionTimeout value, for instance), the server will?expire the session, and the session ID will become invalid. The client object will no longer be usable. To make ZooKeeper API calls, the application must create a new client object.
- If the ZooKeeper server the client currently connects to fails or otherwise does not respond, the client will automatically try to connect to another server before its session ID expires. If successful, the application can continue to use the client.
- The ZooKeeper API methods are either synchronous or asynchronous. Synchronous methods blocks until the server has responded. Asynchronous methods just queue the request for sending and return immediately. They take a callback object that will be executed either on successful execution of the request or on error with an appropriate return code (rc) indicating the error.
- Some successful ZooKeeper API calls can?leave watches?on the "data nodes" in the ZooKeeper server. Other successful ZooKeeper API calls can trigger those watches. Once a watch is triggered, an event will be delivered to the client which left the watch at the first place. Each watch can be triggered only once. Thus, up to one event will be delivered to a client for every watch it leaves.
- A client needs an object of a class implementing?Watcher interface?for processing the events delivered to the client. When a client drops the current connection and re-connects to a server, all the existing watches are considered as being triggered but the undelivered events are lost. To emulate this, the client will generate a special event to tell the event handler a connection has been dropped. This special event has EventType None and KeeperState?Disconnected.
四 其他
The format of snapshot and log files does not change between standalone ZooKeeper servers and different configurations of replicated ZooKeeper servers. Therefore, you can pull these files from a running replicated ZooKeeper server to a development machine with a stand-alone ZooKeeper server for trouble shooting.
Using older log and snapshot files, you can look at the previous state of ZooKeeper servers and even restore that state. The LogFormatter class allows an administrator to look at the transactions in a log.
snapshot和log文件的格式是一致的,所以可以很容易的將線上集群的數(shù)據(jù)文件拷貝到本地來查找問題或重現(xiàn)錯誤;
?
A server might not be able to read its database and fail to come up because of some file corruption in the transaction logs of the ZooKeeper server. You will see some IOException on loading ZooKeeper database. In such a case, make sure all the other servers in your ensemble are up and working. Use "stat" command on the command port to see if they are in good health. After you have verified that all the other servers of the ensemble are up, you can go ahead and clean the database of the corrupt server. Delete all the files in datadir/version-2 and datalogdir/version-2/. Restart the server.
一旦zookeeper數(shù)據(jù)目錄中的某些文件有損壞,zookeeper可能由于無法讀取數(shù)據(jù)導(dǎo)致啟動失敗,這時可以先檢查一下其他服務(wù)器(集群中的大多數(shù))是否正常,如果正常的話,可以將有文件損壞的數(shù)據(jù)目錄文件清空后重啟zookeeper,重啟后zookeeper會自動同步最新的數(shù)據(jù);
?
Although ZooKeeper performs very well by having clients connect directly to voting members of the ensemble, this architecture makes it hard to scale out to huge numbers of clients. The problem is that as we add more voting members, the write performance drops. This is due to the fact that a write operation requires the agreement of (in general) at least half the nodes in an ensemble and therefore the cost of a vote can increase significantly as more voters are added.
We have introduced a new type of ZooKeeper node called an?Observer?which helps address this problem and further improves ZooKeeper's scalability. Observers are non-voting members of an ensemble which only hear the results of votes, not the agreement protocol that leads up to them. Other than this simple distinction, Observers function exactly the same as Followers - clients may connect to them and send read and write requests to them. Observers forward these requests to the Leader like Followers do, but they then simply wait to hear the result of the vote. Because of this, we can increase the number of Observers as much as we like without harming the performance of votes.
由于zookeeper的對寫操作的投票機制,所以集群規(guī)模持續(xù)擴大會伴隨著投票成本的上升,這時zookeeper引入了一種新的節(jié)點類型即observer,observer不參與投票,只負(fù)責(zé)同步數(shù)據(jù)和相應(yīng)客戶端需求;
五 更多
1 為什么zookeeper集群由奇數(shù)(2n+1)個節(jié)點組成?
這里有兩個原因,一個是為了防止腦裂,只有n+1個節(jié)點存活集群才能正常工作,即2n+1個節(jié)點只能有1個集群正常工作;一個是為了數(shù)據(jù)可靠性,每一份數(shù)據(jù)只要有n+1個節(jié)點保存成功,則數(shù)據(jù)不會丟失,這一點可以嚴(yán)格的通過數(shù)據(jù)方法來證明,最簡單的是反證法,假設(shè)有一個數(shù)據(jù)在n+1個節(jié)點上保存成功,但是丟失,說明保存成功的n+1個節(jié)點目前都不可用,但是如果只有n個節(jié)點存活則集群無法正常工作,可以推斷假設(shè)錯誤;
轉(zhuǎn)載于:https://www.cnblogs.com/barneywill/p/9882116.html
總結(jié)
以上是生活随笔為你收集整理的【原创】大数据基础之Zookeeper(1)介绍、安装及使用的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: win7如何显示文件后缀
- 下一篇: adb 连接模拟器