當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

02.es的节点发现和集群构建

發(fā)布時(shí)間：2024/2/28 编程问答 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 02.es的节点发现和集群构建小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文章目錄

- 1. 簡介
- 2. 節(jié)點(diǎn)發(fā)現(xiàn)
- 3. 多數(shù)生效的操作
- 4. 投票人信息設(shè)置
- - 1. Voting configurations 中的節(jié)點(diǎn)信息
  - 2. Voting configurations 為何只保留奇數(shù)的master-eligible node
  - 3. cluster進(jìn)行增減master-eligible node 的時(shí)候Voting configurations 的變化
  - - 1. Voting configurations 的管理策略
    - 2. 增加master-eligible node
    - 3. 移除master-eligible node
  - 4. cluster第一次start的時(shí)候如何初始化Voting configurations
  - - 開發(fā)者模式
- 7. 發(fā)布cluster的state
- - 1. 成功的兩階段提交
  - - 1. 階段一
    - 2. 階段二
  - 2. 可能出現(xiàn)的錯(cuò)誤
  - - 1. 在第一階段發(fā)生超時(shí)
    - 2. 在第二階段發(fā)生超時(shí)
- 8. 集群的故障檢測功能

1. 簡介

elasticsearch的節(jié)點(diǎn)發(fā)現(xiàn)和cluster組建module主要有以下功能。

節(jié)點(diǎn)發(fā)現(xiàn)

master選舉

在cluster state 發(fā)生變化后將其在cluster中publish
當(dāng)然這個(gè)模塊也集成了其他模塊的功能，比如node之間的通信使用了transpost module
主要包含以下幾個(gè)部分的內(nèi)容

節(jié)點(diǎn)發(fā)現(xiàn): 節(jié)點(diǎn)發(fā)現(xiàn)是指master不存在的時(shí)候node如何發(fā)現(xiàn)彼此，一般是集群開始啟動(dòng)或者之前的master掛掉了的情況。

多數(shù)生效的操作: 主要講述elasticsearch在即使少數(shù)nodes不工作的情況下也能夠正常作出decision的quorum-based voting機(jī)制

投票人信息設(shè)置: elasticsearch 的可以參與投票的節(jié)點(diǎn)信息的設(shè)置，elasticsearch會(huì)在node 假如或者離開的時(shí)候自動(dòng)更新這個(gè)設(shè)置

cluster的啟動(dòng)過程: 集群啟動(dòng)的過程以及需要的配置

add或者remov 一個(gè) master-eligible node需要注意的事情

發(fā)布cluster的state: 這個(gè)過程只有master能夠執(zhí)行

集群的故障檢測功能

一些設(shè)置信息

2. 節(jié)點(diǎn)發(fā)現(xiàn)

節(jié)點(diǎn)發(fā)現(xiàn)是集群組建模塊查找與之形成集群的其他節(jié)點(diǎn)的過程。當(dāng)您啟動(dòng)Elasticsearch節(jié)點(diǎn)或節(jié)點(diǎn)認(rèn)為master節(jié)點(diǎn)發(fā)生故障時(shí)，此過程將運(yùn)行，并持續(xù)到找到主節(jié)點(diǎn)或選舉出新的主節(jié)點(diǎn)為止。
在elasticsearch.yml文件中一般有以下配置

discovery.seed_hosts: ["10.66.0.189:9300","10.66.4.45:9300","10.66.0.239:9300","10.66.2.173:9300"]cluster.initial_master_nodes: ["ES01", "ES02","ES03"]

節(jié)點(diǎn)發(fā)現(xiàn)的過程一般是node通過elasticsearch.yml中配置的discovery.seed_hosts和當(dāng)前node中存儲(chǔ)的之前的cluster-state中記錄的master-eligible nodes 作為seed address(種子服務(wù)地址)來開始進(jìn)行節(jié)點(diǎn)發(fā)現(xiàn)的。
這個(gè)過程分為兩個(gè)階段：

每個(gè)node都請(qǐng)求他的seed address簡歷連接，然后verify這個(gè)node是否是一個(gè)master-eligible node。

如果第一步連接成功了并執(zhí)行了相應(yīng)的請(qǐng)求之后,當(dāng)前node會(huì)和這些node相互share他們發(fā)現(xiàn)的master-eligible node list,然后當(dāng)前node如果拿到了新的master-eligible node，那么他會(huì)再向這些新的node發(fā)起請(qǐng)求，重復(fù)這個(gè)過程。

說明

如果當(dāng)前node不是一個(gè)master-eligible node ，那么他會(huì)持續(xù)這個(gè)discovery過程，直到他發(fā)現(xiàn)了一個(gè)elected master node。如果沒有發(fā)現(xiàn)elected master node,當(dāng)前節(jié)點(diǎn)會(huì)在 1s鐘以后進(jìn)行重試（discovery.find_peers_interval）

如果當(dāng)前node是一個(gè)master-eligible node ，那么他會(huì)持續(xù)這個(gè)discovery過程，直到他發(fā)現(xiàn)了一個(gè)elected master node，或者是找到足夠數(shù)量的master-eligible node來進(jìn)行一次master選舉。在沒有滿足以上兩種情況的時(shí)候，他會(huì)在1s以后進(jìn)行重試(discovery.find_peers_interval）

3. 多數(shù)生效的操作

Quorum-based decision making
主要講述elasticsearch在即使少數(shù)nodes不工作的情況下也能夠正常作出decision的quorum-based voting機(jī)制
elasticsearch是一個(gè)由多個(gè)節(jié)點(diǎn)組成的分布式集群，在這個(gè)集群中有一部分節(jié)點(diǎn)是master-eligible node，其他的節(jié)點(diǎn)一般是data node, 在運(yùn)行的過程中只有一個(gè)master-eligible node會(huì)成為實(shí)際上的master node。cluster的state總是由當(dāng)前master在整個(gè)集群中進(jìn)行publish操作。那么cluster-state的數(shù)據(jù)的一致性是需要保障的，比如說當(dāng)前的master如果宕機(jī)了，那么應(yīng)該選取哪個(gè)master-eligible node呢，如果所有的master-eligible node 都有一樣的cluster-state數(shù)據(jù)，那么選擇哪個(gè)成為新的master都可以，但是這種情況下就是當(dāng)前master在決定記錄一個(gè)cluster-state的時(shí)候會(huì)等所有的master-eligible node 都確認(rèn)可以保存了才會(huì)記錄該cluster-state。這種情況下如果有一個(gè)master-eligible node在集群運(yùn)行的過程中掛掉了，那么整個(gè)cluster-state就無法再進(jìn)行發(fā)布，集群也就無法正常工作了。所以elasticsearch使用的是quorum-base的工作模式，比如在master決定某個(gè)cluster-state是否可以保存的時(shí)候不是等待所有的master-eligible node 都確認(rèn)，而是等到一定數(shù)量的master-eligible node確認(rèn)后就認(rèn)為這個(gè)cluster-state可以保存了，這個(gè)一定數(shù)量的master-eligible node 就被稱為quorum-base的工作模式。這種情況下，即使有少數(shù)的master-eligible node 發(fā)生了宕機(jī)，整個(gè)cluster仍然可以繼續(xù)運(yùn)行下去，集群的魯棒性得到了很大的提升。但是這個(gè)quorums 需要謹(jǐn)慎選擇以避免腦裂等問題的發(fā)生。

Quorum-based decision making 的主要使用場景就是clustet-state進(jìn)行commit的管理和master 的選舉。
elasticsearce的quorum是這樣計(jì)算的，假如master-eligible node 的數(shù)量是n 則 quorum=(n+1)/2,需要注意的是n的數(shù)量需要為奇數(shù)，即使你有偶數(shù)個(gè)，elasticsearch也會(huì)忽略掉一個(gè)。
當(dāng)然這個(gè)內(nèi)部還有細(xì)節(jié)，我們后面會(huì)展開詳述，實(shí)際上比這個(gè)更靈活一些也更復(fù)雜一些,elasticsearch內(nèi)部實(shí)際上是通過Voting configurations這個(gè)設(shè)置來維護(hù)quorum的。

所以，如果你需要從cluster剔除部分master-eligible node 的話，不能一次剔除太多，理論上要小于master-eligible node一半的數(shù)量。當(dāng)然，如果你剔除的不是master-eligible node的話對(duì)集群的可用性是沒有影響的，主要就考慮你的數(shù)據(jù)備份情況了。

4. 投票人信息設(shè)置

Voting configurations
elasticsearch 的可以參與投票的節(jié)點(diǎn)信息的設(shè)置，elasticsearch會(huì)在node 加入或者離開的時(shí)候自動(dòng)更新這個(gè)設(shè)置,這個(gè)信息也是產(chǎn)生quorum-base的基礎(chǔ)。
Voting configurations 保存的是參與決策的master-eligible node 的信息，參與決策是指在進(jìn)行master electing或者是committing一個(gè)cluster-state的時(shí)候需要這些master-eligible node的投票。
只有在大多數(shù)的node（超過一半數(shù)量）都確認(rèn)了之后操作才算成功（比如cluster-state可以進(jìn)行commit或者master elect成功）。這里的超過一半就是指上面的quorum-base。

1. Voting configurations 中的節(jié)點(diǎn)信息

偶數(shù)的master-eligible node ：如果當(dāng)前集群內(nèi)存在的master-eligible node 的數(shù)量是偶數(shù)n的話，那么Voting configurations 中只會(huì)記錄n-1個(gè)node的信息。

其他情況下正常情況下Voting configurations 中保存的就是當(dāng)前cluster中的所有的master-eligible node 。

在cluster中的master-eligible node 增加或者減少的情況下Voting configurations 中的節(jié)點(diǎn)信息如何變化可以配置策略

集群在第一次start的時(shí)候根據(jù)elasticsearch.yml中配置的cluster.initial_master_nodes來初始化Voting configurations

2. Voting configurations 為何只保留奇數(shù)的master-eligible node

拿cluster-state的更新來舉例子，因?yàn)椴捎玫氖莙uorum-base的策略，所以是超過一半的數(shù)的節(jié)點(diǎn)同意后才能進(jìn)行commit操作，假如一個(gè)節(jié)點(diǎn)有4個(gè)master-eligible node。
那么如果這4個(gè)都記錄在Voting configurations 當(dāng)中的話，那么quorum的數(shù)量就必須是3，當(dāng)集群產(chǎn)生分區(qū)后的每個(gè)分區(qū)都是2個(gè)node的話，那么該集群將不能工作，因?yàn)閮蛇叾紱]有3個(gè)節(jié)點(diǎn)。
但是假如Voting configurations 記錄的是3個(gè)節(jié)點(diǎn)的數(shù)量的話，那么quorum的數(shù)量就是2，當(dāng)集群產(chǎn)生分區(qū)后的每個(gè)分區(qū)都是2個(gè)node的話，那么肯定是某一個(gè)集群是可以工作的，另一個(gè)是不能工作的。
假如是節(jié)點(diǎn)宕機(jī)的話，兩種情況下的容忍度都是1。
所以相對(duì)來說這種配置比Voting configurations 中直接存儲(chǔ)的時(shí)候4個(gè)節(jié)點(diǎn)的情況具備更高的可用性。

3. cluster進(jìn)行增減master-eligible node 的時(shí)候Voting configurations 的變化

正常情況下的建議是集群中的master-eligible node 是固定的，在集群伸縮的時(shí)候主要是對(duì)data node進(jìn)行增減。

但是有些時(shí)候我們確實(shí)也需要對(duì)master-eligible node 進(jìn)行增減操作，這個(gè)時(shí)候需要注意些什么呢

1. Voting configurations 的管理策略

elasticsearch對(duì)Voting configurations 的管理可以使用不同的策略。默認(rèn)的情況下elasticsearch會(huì)在add remove的時(shí)候自動(dòng)對(duì)Voting configurations 進(jìn)行維護(hù)。這個(gè)可以通過cluster.auto_shrink_voting_configuration進(jìn)行設(shè)置。
默認(rèn)情況下這個(gè)設(shè)置是true，也就是es會(huì)自動(dòng)調(diào)整，如果設(shè)置為false了，就必須要通過api來手動(dòng)管理Voting configurations 了。
當(dāng)然，如果設(shè)置為true了你依然可以通過對(duì)應(yīng)的api來進(jìn)行管理Voting configurations
需要注意的一點(diǎn)是cluster.auto_shrink_voting_configuration生效的情況只有在cluster中的master-eligible node的數(shù)量大于等于3個(gè)的時(shí)候才能生效。如果集群中當(dāng)前有3個(gè)，然后master掛掉了，這個(gè)時(shí)候集群是正常的，但是集群不會(huì)進(jìn)行Voting configurations 的收縮,也就是當(dāng)集群再掛掉一個(gè)master的時(shí)候那么集群就是不可用的了。

2. 增加master-eligible node

在一個(gè)穩(wěn)定的集群中增加一個(gè)或者多個(gè)master-eligible node的時(shí)候，這個(gè)新的node會(huì)發(fā)起一個(gè)join request 到當(dāng)前的cluster中的master節(jié)點(diǎn)來正式的加入集群。
這個(gè)請(qǐng)求的默認(rèn)超時(shí)時(shí)間是30s(cluster.join.timeout)

3. 移除master-eligible node

正常情況下，在開啟了cluster.auto_shrink_voting_configuration設(shè)置的情況下，如果你移除的master-eligible node的數(shù)量少于當(dāng)前集群中總的master-eligible node數(shù)量的一半的話，不用擔(dān)心，elasticsearch會(huì)自動(dòng)做一些調(diào)整，同時(shí)集群始終是可以提供服務(wù)的狀態(tài)。
當(dāng)時(shí)如果你同時(shí)移除的master-eligible node的數(shù)量大于等于總數(shù)的一半的時(shí)候，你就不能直接移除了，因?yàn)閑lasticsearch這個(gè)時(shí)候會(huì)認(rèn)為quorum不足，集群也就沒有辦法正常工作了。
如果想要移除一半以上的master-eligible node那么只能通過 Voting Configuration Exclusions API 來進(jìn)行操作了。

# 在voting configuration exclusions list 添加一個(gè)node，會(huì)等待30s(默認(rèn)的時(shí)間)來讓auto-reconfigure 完成 POST /_cluster/voting_config_exclusions/$node_name#在voting configuration exclusions list 添加一個(gè)node，會(huì)等待1min(默認(rèn)的時(shí)間)來讓auto-reconfigure 完成 POST /_cluster/voting_config_exclusions/$node_name?timeout=1m

同樣需要注意的是，只有在同時(shí)移除一半以上的master-eligible node的時(shí)候才需要使用Exclusions API，而且voting configuration exclusions list 默認(rèn)的最大值是10，可以通過cluster.max_voting_config_exclusions控制
還有非常重要的一點(diǎn)是，在使用完exclusions list之后需要進(jìn)行復(fù)位，也就是說一個(gè)健康的，正在運(yùn)行的cluster的exclusions list 應(yīng)該是空的。

通過這個(gè)查看當(dāng)前的exclusions list

GET /_cluster/state?filter_path=metadata.cluster_coordination.voting_config_exclusions # 等待所有nodec處理完成的模式 DELETE /_cluster/voting_config_exclusions# 立即返回模式 DELETE /_cluster/voting_config_exclusions?wait_for_removal=false

4. cluster第一次start的時(shí)候如何初始化Voting configurations

雖然elasticsearch 使用quarum-base的方式進(jìn)行master elect ,但是這個(gè)時(shí)候Voting configurations 還沒有設(shè)置完，所以quarum-num也是沒有辦法計(jì)算的。
這個(gè)時(shí)候如果僅僅是通過配置discovery.seed_hosts來通過查找的方式來初始化Voting configurations 也是不可靠的，因?yàn)椴⒉恢赖絤aster-eligible node到底有多少個(gè)
所以初始化的時(shí)候必須要通過外部配置來識(shí)別有哪些master-eligible node

cluster.initial_master_nodes: ["ES01", "ES02","ES03"]

這里使用的是每個(gè)master-eligible node的node.name
注意事項(xiàng)：

每個(gè)master-eligible node都要配置一樣的cluster.initial_master_nodes

對(duì)于master-ineligible nodes 不需要設(shè)置這個(gè)屬性

對(duì)于要要加入一個(gè)已經(jīng)存在的cluster的master-eligible nodes 也不需要設(shè)置這個(gè)屬性

在cluster進(jìn)行restart的時(shí)候也不需要再設(shè)置這個(gè)屬性

也就是在實(shí)際使用中只有第一次創(chuàng)建cluster的時(shí)候這個(gè)是有用的，當(dāng)然，實(shí)際使用中發(fā)現(xiàn)對(duì)于2，3，4條即使多設(shè)置了也不會(huì)有影響。

開發(fā)者模式

如果node的elasticsearch.yml中沒有任何以下配置的時(shí)候，es啟動(dòng)的時(shí)候會(huì)使用開發(fā)者模式

discovery.seed_providers discovery.seed_hosts cluster.initial_master_nodes

在開發(fā)者模式下只會(huì)嘗試連接本機(jī)的其他node來組件一個(gè)cluster，但是不會(huì)連接其他機(jī)器的節(jié)點(diǎn)
生產(chǎn)模式下一定注意要配置

discovery.seed_hosts cluster.initial_master_nodes

7. 發(fā)布cluster的state

這個(gè)過程只有master能夠執(zhí)行,master node是cluster中唯一能夠修改cluster-state的節(jié)點(diǎn)。
這個(gè)過程類似于一個(gè)兩階段提交。master 會(huì)使用batch的方式來更新cluster-state,這樣可以避免更新的頻次過于頻繁。

1. 成功的兩階段提交

1. 階段一

master向集群中的所有node 廣播（broadcast）更新后的cluster state

每一個(gè)node在收到master的這個(gè)廣播消息后會(huì)對(duì)消息進(jìn)行預(yù)提交，如果確認(rèn)本機(jī)可以提交的話會(huì)給master返回ack信息

2. 階段二

master在收到超過半數(shù)的（quorum-base）的master-eligible node的ack信息之后就會(huì)認(rèn)為這個(gè)消息可以commit了，master會(huì)進(jìn)行commit，然后再給cluster中的其他節(jié)點(diǎn)發(fā)送commit消息

其他節(jié)點(diǎn)在收到該消息后會(huì)對(duì)階段一收到的cluster state進(jìn)行正式的提交，然后給master回復(fù)ack信息

2. 可能出現(xiàn)的錯(cuò)誤

這個(gè)兩階段提交的過程也是可能出現(xiàn)錯(cuò)誤的，第一類是超時(shí)錯(cuò)誤，這個(gè)過程肯定有時(shí)間限制，在一定時(shí)間內(nèi)未完成兩階段的提交則會(huì)出現(xiàn)不同的處理方式。
elasticsearch使用 cluster.publish.timeout來控制整個(gè)過程的耗時(shí)默認(rèn)是30s。
超時(shí)發(fā)生在第一階段和第二階段elasticsearch又會(huì)作出不同的反應(yīng)。

1. 在第一階段發(fā)生超時(shí)

也就是說master在廣播完cluster state的信息后，在cluster.publish.timeout的時(shí)間內(nèi)并沒有收到quorum-num的master-eligible node的ack回應(yīng)。
那么當(dāng)前master會(huì)認(rèn)為自己的號(hào)召力不夠，不會(huì)保存這個(gè)cluster-state,然后退位發(fā)起一次master選舉。

2. 在第二階段發(fā)生超時(shí)

如果第一個(gè)階段正常完成了，master會(huì)認(rèn)為這個(gè)update是成功的，對(duì)這個(gè)cluster-state進(jìn)行commit, 然后向其他node broadcast commit信息。
然后master會(huì)等待其他node ack的消息，在這個(gè)時(shí)候如果達(dá)到cluster.publish.timeout設(shè)置的時(shí)間了，那么master會(huì)接著進(jìn)行下一輪的cluster-state的處理工作，
同時(shí)master會(huì)記錄那些沒有還沒有回復(fù)ack的節(jié)點(diǎn)(這些節(jié)點(diǎn)有可能是在階段一有可能是在階段二)，這些node被認(rèn)為是落后節(jié)點(diǎn)，對(duì)于這些落后節(jié)點(diǎn)，master會(huì)給他們一些額外的時(shí)間來跟上master,
如果在cluster.follower_lag.timeout的時(shí)間內(nèi)沒有完成這次cluster-state的更新，那么就可以認(rèn)為這個(gè)node 掛掉了，然后這個(gè)node就會(huì)被從cluster當(dāng)中踢出去。

cluster-state的更新采用的是增量更新的方式，這樣的話更新的數(shù)據(jù)就會(huì)更少，集群的壓力也更小。
當(dāng)然對(duì)于那些rejoin或者是new-join的node,master會(huì)針對(duì)該node做一次full-cluster-state的同步，然后后面再使用增量同步的方式進(jìn)行cluster-state的同步。

elasticsearch是一個(gè)點(diǎn)對(duì)點(diǎn)的系統(tǒng)，正常的操作比如index,delete,search等并不會(huì)對(duì)master產(chǎn)生影響，master只負(fù)責(zé)集群的全局信息和當(dāng)node 離開或者加入cluster時(shí)候產(chǎn)生的reassign shards 信息

8. 集群的故障檢測功能

集群中的master會(huì)定期的檢測cluster中的每一個(gè)node來確保他們都處于connected的狀態(tài)而且是healthy的狀態(tài),這個(gè)過程被稱為follower check。
其他的node也會(huì)周期性的檢測master的health,這個(gè)過程被稱為leader check。

elasticsearch允許這些check偶爾的失敗或者超時(shí)，之后連續(xù)數(shù)次的失敗或者超時(shí)的時(shí)候，才會(huì)認(rèn)為對(duì)方不可用了?？梢酝ㄟ^cluster.fault_detection.*來設(shè)置這些設(shè)置。

然而，如果master發(fā)現(xiàn)一個(gè)node不能被connected了，那么就會(huì)跳過timeout機(jī)制，直接嘗試remove 這個(gè)node。同樣的，當(dāng)一個(gè)node發(fā)現(xiàn)master不能被connected的時(shí)候他也會(huì)立刻進(jìn)行discovery的過程。
可能是找到master或者重新選出一個(gè)新的master

總結(jié)

以上是生活随笔為你收集整理的02.es的节点发现和集群构建的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： 01.cluster模块综述
下一篇： 03.shard_allocation_