Linux 高可用(HA)集群之Pacemaker详解
大綱
說明:本來我不想寫這篇博文的,因為前幾篇博文都有介紹pacemaker,但是我覺得還是得寫一下,試想應該會有博友需要,特別是pacemaker 1.1.8(CentOS 6.4)以后,pacemaker 有些特別要注意的變化 ,最后想說,敬開源,敬開源精神。(pacemaker官方網站:http://clusterlabs.org/)
一、pacemaker 是什么
二、pacemaker 特點
三、pacemaker 軟件包供應商
四、pacemaker 版本信息
五、pacemaker 配置案例
六、pacemaker 支持集群架構
七、pacemaker 內部結構
八、pacemaker 源代碼組成
九、Centos6.4+Corosync+Pacemaker 實現高可用的Web集群
一、pacemaker 是什么
1.pacemaker 簡單說明
? ?pacemaker(直譯:心臟起搏器),是一個群集資源管理器。它實現最大可用性群集服務(亦稱資源管理)的節點和資源級故障檢測和恢復使用您的首選集群基礎設施(OpenAIS的或Heaerbeat)提供的消息和成員能力。
? ?它可以做乎任何規模的集群,并配備了一個強大的依賴模型,使管理員能夠準確地表達群集資源之間的關系(包括順序和位置)。幾乎任何可以編寫腳本,可以管理作為心臟起搏器集群的一部分。
? ?我再次說明一下,pacemaker是個資源管理器,不是提供心跳信息的,因為它似乎是一個普遍的誤解,也是值得的。pacemaker是一個延續的CRM(亦稱Heartbeat V2資源管理器),最初是為心跳,但已經成為獨立的項目。
2.pacemaker 由來
大家都知道,Heartbeat 到了V3版本后,拆分為多個項目,其中pacemaker就是拆分出來的資源管理器。
Heartbeat 3.0拆分之后的組成部分:
Heartbeat:將原來的消息通信層獨立為heartbeat項目,新的heartbeat只負責維護集群各節點的信息以及它們之前通信;
Cluster Glue:相當于一個中間層,它用來將heartbeat和pacemaker關聯起來,主要包含2個部分,即為LRM和STONITH。
Resource Agent:用來控制服務啟停,監控服務狀態的腳本集合,這些腳本將被LRM調用從而實現各種資源啟動、停止、監控等等。
Pacemaker : 也就是Cluster Resource Manager (簡稱CRM),用來管理整個HA的控制中心,客戶端通過pacemaker來配置管理監控整個集群。
二、pacemaker 特點
主機和應用程序級別的故障檢測和恢復
幾乎支持任何冗余配置
同時支持多種集群配置模式
配置策略處理法定人數損失(多臺機器失敗時)
支持應用啟動/關機順序
支持,必須/必須在同一臺機器上運行的應用程序
支持多種模式的應用程序(如主/從)
可以測試任何故障或群集的群集狀態
注:說白了意思就是功能強大,現在最主流的資源管理器。
三、pacemaker 軟件包供應商
目前pacemaker支持主流的操作系統,
Fedora(12.0)
紅帽企業Linux(5.0,6.0)
openSUSE(11.0)
Debian
Ubuntu的LTS(10.4)
CentOS (5.0,6.0)
四、pacemaker 版本信息
目前,最新版的是pacemaker 1.1.10 ,是2013年7月發布的
五、pacemaker 配置案例
1.主/從架構
說明:許多高可用性的情況下,使用Pacemaker和DRBD的雙節點主/從集群是一個符合成本效益的解決方案。
2.多節點備份集群
說明:支持多少節點,Pacemaker可以顯著降低硬件成本通過允許幾個主/從群集要結合和共享一個公用備份節點。
3.共享存儲集群
說明:有共享存儲時,每個節點可能被用于故障轉移。Pacemaker甚至可以運行多個服務。
4.站點集群
說明:Pacemaker 1.2 將包括增強簡化設立分站點集群
六、pacemaker 支持集群
1.基于OpenAIS的集群
2.傳統集群架構,基于心跳信息
七、pacemaker 內部結構
1.群集組件說明:
stonithd:心跳系統。
lrmd:本地資源管理守護進程。它提供了一個通用的接口支持的資源類型。直接調用資源代理(腳本)。
pengine:政策引擎。根據當前狀態和配置集群計算的下一個狀態。產生一個過渡圖,包含行動和依賴關系的列表。
CIB:群集信息庫。包含所有群集選項,節點,資源,他們彼此之間的關系和現狀的定義。同步更新到所有群集節點。
CRMD:集群資源管理守護進程。主要是消息代理的PEngine和LRM,還選舉一個領導者(DC)統籌活動(包括啟動/停止資源)的集群。
OpenAIS:OpenAIS的消息和成員層。
Heartbeat:心跳消息層,OpenAIS的一種替代。
CCM:共識群集成員,心跳成員層。
2.功能概述
? ?CIB使用XML表示集群的集群中的所有資源的配置和當前狀態。CIB的內容會被自動在整個集群中同步,使用PEngine計算集群的理想狀態,生成指令列表,然后輸送到DC(指定協調員)。Pacemaker 集群中所有節點選舉的DC節點作為主決策節點。如果當選DC節點宕機,它會在所有的節點上, 迅速建立一個新的DC。DC將PEngine生成的策略,傳遞給其他節點上的LRMd(本地資源管理守護程序)或CRMD通過集群消息傳遞基礎結構。當集群中有節點宕機,PEngine重新計算的理想策略。在某些情況下,可能有必要關閉節點,以保護共享數據或完整的資源回收。為此,Pacemaker配備了stonithd設備。STONITH可以將其它節點“爆頭”,通常是實現與遠程電源開關。Pacemaker會將STONITH設備,配置為資源保存在CIB中,使他們可以更容易地監測資源失敗或宕機。
八、pacemaker 源代碼組成
說明:大家可以看到Pacemaker主要是由C語言寫的,其次是Python,說明其效率非常高。最后我們來說一個小案例,實現高可用的Web集群。
九、Centos6.4+Corosync+Pacemaker 實現高可用的Web集群
1.環境說明
(1).操作系統
CentOS 6.4 X86_64 位系統
(2).軟件環境
Corosync 1.4.1
Pacemaker 1.1.8
crmsh 1.2.6
(3).拓撲準備
2.Corosync與Pacemaker 安裝與配置
Corosync與Pacemaker安裝與配置我就不在這里重復說明了,大家參考一下這篇博文:http://freeloda.blog.51cto.com/2033581/1272417 (Linux 高可用(HA)集群之Corosync詳解)
3.Pacemaker 配置資源方法
(1).命令配置方式
crmsh
pcs
(2).圖形配置方式
pygui
hawk
LCMC
pcs
注:本文主要的講解的是crmsh
4.crmsh 簡單說明
注:以下上pacemaker 1.1.8的更新說明,最重要的我用紅色標記出來,從pacemaker 1.1.8開始,crm sh 發展成一個獨立項目,pacemaker中不再提供,說明我們安裝好pacemaker后,是不會有crm這個命令行模式的資源管理器的。
[root@node1 ~]# cd /usr/share/doc/pacemaker-1.1.8/ [root@node1 pacemaker-1.1.8]# ll 總用量 132 -rw-r--r-- 1 root root 1102 2月 22 13:05 AUTHORS -rw-r--r-- 1 root root 109311 2月 22 13:05 ChangeLog -rw-r--r-- 1 root root 18046 2月 22 13:05 COPYING [root@node1 pacemaker-1.1.8]# vim ChangeLog * Thu Sep 20 2012 Andrew Beekhof <andrew@beekhof.net> Pacemaker-1.1.8-1 - Update source tarball to revision: 1a5341f - Statistics: Changesets: 1019 Diff: 2107 files changed, 117258 insertions(+), 73606 deletions(-) - All APIs have been cleaned up and reduced to essentials - Pacemaker now includes a replacement lrmd that supports systemd and upstart agents - Config and state files (cib.xml, PE inputs and core files) have moved to new locations - The crm shell has become a separate project and no longer included with Pacemaker (crm shell 已成為一個獨立的項目,pacemaker中一再提供) - All daemons/tools now have a unified set of error codes based on errno.h (see crm_error) [root@node1 ~]# crm crmadmin crm_diff crm_failcount crm_mon crm_report crm_shadow crm_standby crm_verify crm_attribute crm_error crm_master crm_node crm_resource crm_simulate crm_ticket注:大家可以看到,安裝好pacemaker后,就沒有crm shell命令行工具,我們得單獨安裝。下面我們就來說說怎么安裝crm sh
5.安裝crmsh資源管理工具
(1).crmsh官方網站
https://savannah.nongnu.org/forum/forum.php?forum_id=7672
(2).crmsh下載地址
http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/
(3).安裝crmsh
[root@node1 ~]# rpm -ivh crmsh-1.2.6-0.rc2.2.1.x86_64.rpm warning: crmsh-1.2.6-0.rc2.2.1.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 7b709911: NOKEY error: Failed dependencies: pssh is needed by crmsh-1.2.6-0.rc2.2.1.x86_64 python-dateutil is needed by crmsh-1.2.6-0.rc2.2.1.x86_64 python-lxml is needed by crmsh-1.2.6-0.rc2.2.1.x86_64注:大家可以看到年缺少依賴包,我們先用yum安裝依賴包
[root@node1 ~]# yum install -y python-dateutil python-lxml [root@node1 ~]# rpm -ivh crmsh-1.2.6-0.rc2.2.1.x86_64.rpm --nodeps warning: crmsh-1.2.6-0.rc2.2.1.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 7b709911: NOKEY Preparing... ########################################### [100%] 1:crmsh ########################################### [100%] [root@node1 ~]# crm #安裝好后出現一個crm命令,說明安裝完成 crm crm_attribute crm_error crm_master crm_node crm_resource crm_simulate crm_ticket crmadmin crm_diff crm_failcount crm_mon crm_report crm_shadow crm_standby crm_verify [root@node1 ~]# crm #輸入crm命令,進入資源配置模式 Cannot change active directory to /var/lib/pacemaker/cores/root: No such file or directory (2) crm(live)# help #查看一下幫助 This is crm shell, a Pacemaker command line interface. Available commands:cib manage shadow CIBs resource resources management configure CRM cluster configuration node nodes management options user preferences history CRM cluster history site Geo-cluster support ra resource agents information center status show cluster status help,? show help (help topics for list of topics) end,cd,up go back one level quit,bye,exit exit the program crm(live)#注:到此準備工作全部完成,下面我們來具體配置一下高可用的Web集群,在配置之前我們還得簡的說明一下,crm sh 如何使用!
6.crmsh使用說明
注:簡單說明一下,其實遇到一個新命令最好的方法就是man一下!簡單的先熟悉一下這個命令,然后再慢慢嘗試。
[root@node1 ~]# crm #輸入crm命令,進入crm sh 模式 Cannot change active directory to /var/lib/pacemaker/cores/root: No such file or directory (2) crm(live)# help #輸入help查看一下,會出下很多子命令 This is crm shell, a Pacemaker command line interface. Available commands:cib manage shadow CIBs resource resources management configure CRM cluster configurationnode nodes management options user preferences history CRM cluster history site Geo-cluster support ra resource agents information center status show cluster status help,? show help (help topics for list of topics) end,cd,up go back one level quit,bye,exit exit the program crm(live)# configure #輸入configure就會進入,configure模式下, crm(live)configure# #敲兩下tab鍵就會顯示configure下全部命令 ? default-timeouts group node rename simulate bye delete help op_defaults role template cd edit history order rsc_defaults up cib end load primitive rsc_template upgrade cibstatus erase location property rsc_ticket user clone exit master ptest rsctest verify collocation fencing_topology modgroup quit save xml colocation filter monitor ra schema commit graph ms refresh show crm(live)configure# help node #輸入help加你想了解的任意命令,就會顯示該命令的使用幫助與案例 The node command describes a cluster node. Nodes in the CIB are commonly created automatically by the CRM. Hence, you should not need to deal with nodes unless you also want to define node attributes. Note that it is also possible to manage node attributes at the `node` level. Usage: ............... node <uname>[:<type>] [attributes <param>=<value> [<param>=<value>...]] [utilization <param>=<value> [<param>=<value>...]]type :: normal | member | ping ............... Example: ............... node node1 node big_node attributes memory=64 ............... 注:好了,簡單說明就到這,其實就是一句話,不會的命令help一下。下面我們開始配置,高可用的Web集群。7.crmsh 配置高可用的Web集群
(1).查看一下默認配置
[root@node1 ~]# crm Cannot change active directory to /var/lib/pacemaker/cores/root: No such file or directory (2) crm(live)# configure crm(live)configure# show node node1.test.com node node2.test.com property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \(2).檢測一下配置文件是否有錯
crm(live)# configure crm(live)configure# verify crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Resource start-up disabled since no STONITH resources have been defined crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option crm_verify[5202]: 2011/06/14_19:10:38 ERROR: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity 注:說我們的STONITH resources沒有定義,因我們這里沒有STONITH設備,所以我們先關閉這個屬性 crm(live)configure# property stonith-enabled=false crm(live)configure# show node node1.test.com node node2.test.com property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" crm(live)configure# verify #現在已經不報錯(3).查看當前集群系統所支持的類型
crm(live)# ra crm(live)ra# classes lsb ocf / heartbeat pacemaker redhat service stonith(4).查看某種類別下的所用資源代理的列表
crm(live)ra# list lsb auditd blk-availability corosync corosync-notifyd crond halt htcacheclean httpd ip6tables iptables killall lvm2-lvmetad lvm2-monitor messagebus netconsole netfs network nfs nfslock ntpd ntpdate pacemaker postfix quota_nld rdisc restorecond rpcbind rpcgssd rpcidmapd rpcsvcgssd rsyslog sandbox saslauthd single sshd svnserve udev-post winbind crm(live)ra# list ocf heartbeat AoEtarget AudibleAlarm CTDB ClusterMon Delay Dummy EvmsSCC Evmsd Filesystem ICP IPaddr IPaddr2 IPsrcaddr IPv6addr LVM LinuxSCSI MailTo ManageRAID ManageVE Pure-FTPd Raid1 Route SAPDatabase SAPInstance SendArp ServeRAID SphinxSearchDaemon Squid Stateful SysInfo VIPArip VirtualDomain WAS WAS6 WinPopup Xen Xinetd anything apache conntrackd db2 drbd eDir88 ethmonitor exportfs fio iSCSILogicalUnit iSCSITarget ids iscsi jboss lxc mysql mysql-proxy nfsserver nginx oracle oralsnr pgsql pingd portblock postfix proftpd rsyncd scsi2reservation sfex symlink syslog-ng tomcat vmware crm(live)ra# list ocf pacemaker ClusterMon Dummy HealthCPU HealthSMART Stateful SysInfo SystemHealth controld o2cb ping pingd(5).查看某個資源代理的配置方法
crm(live)ra# info ocf:heartbeat:IPaddr Manages virtual IPv4 addresses (portable version) (ocf:heartbeat:IPaddr) This script manages IP alias IP addresses It can add an IP alias, or remove one. Parameters (* denotes required, [] the default): ip* (string): IPv4 address The IPv4 address to be configured in dotted quad notation, for example "192.168.1.1". nic (string, [eth0]): Network interface The base network interface on which the IP address will be brought online. If left empty, the script will try and determine this from the routing table. Do NOT specify an alias interface in the form eth0:1 or anything here; rather, specify the base interface only. Prerequisite: There must be at least one static IP address, which is not managed by the cluster, assigned to the network interface. If you can not assign any static IP address on the interface, :(6).接下來要創建的web集群創建一個IP地址資源(IP資源是主資源,我們查看一下怎么定義一個主資源)
crm(live)# configure crm(live)configure# primitive usage: primitive <rsc> {[<class>:[<provider>:]]<type>|@<template>} [params <param>=<value> [<param>=<value>...]] [meta <attribute>=<value> [<attribute>=<value>...]] [utilization <attribute>=<value> [<attribute>=<value>...]] [operations id_spec [op op_type [<attribute>=<value>...] ...]] crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=192.168.18.200 nic=eth0 cidr_netmask=24 #增加一個VIP資源 crm(live)configure# show #查看已增加好的VIP,我用紅色標記了一下 node node1.test.com node node2.test.com primitive vip ocf:heartbeat:IPaddr \ params ip="192.168.18.200" nic="eth0" cidr_netmask="24" property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" crm(live)configure# verify #檢查一下配置文件有沒有錯誤 crm(live)configure# commit #提交配置的資源,在命令行配置資源時,只要不用commit提交配置好資源,就不會生效,一但用commit命令提交,就會寫入到cib.xml的配置文件中 crm(live)# status #查看一下配置好的資源狀態,有一個資源vip,運行在node1上 Last updated: Thu Aug 15 14:24:45 2013 Last change: Thu Aug 15 14:21:21 2013 via cibadmin on node1.test.com Stack: classic openais (with plugin) Current DC: node1.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ node1.test.com node2.test.com ]vip (ocf::heartbeat:IPaddr): Started node1.test.com 查看一下node1節點上的ip,大家可以看到vip已經生效,而后我們到node2上通過如下命令停止node1上的corosync服務,再查看狀態 ?
測試,停止node1節點上的corosync,可以看到node1已經離線
[root@node2 ~]# ssh node1 "service corosync stop" Signaling Corosync Cluster Engine (corosync) to terminate: [確定] Waiting for corosync services to unload:..[確定] [root@node2 ~]# crm status Cannot change active directory to /var/lib/pacemaker/cores/root: No such file or directory (2) Last updated: Thu Aug 15 14:29:04 2013 Last change: Thu Aug 15 14:21:21 2013 via cibadmin on node1.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition WITHOUT quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ node2.test.com ] OFFLINE: [ node1.test.com ]重點說明:上面的信息顯示node1.test.com已經離線,但資源vip卻沒能在node2.test.com上啟動。這是因為此時的集群狀態為"WITHOUT quorum"(紅色標記),即已經失去了quorum,此時集群服務本身已經不滿足正常運行的條件,這對于只有兩節點的集群來講是不合理的。因此,我們可以通過如下的命令來修改忽略quorum不能滿足的集群狀態檢查:property no-quorum-policy=ignore
crm(live)# configure crm(live)configure# property no-quorum-policy=ignore crm(live)configure# show node node1.test.com node node2.test.com primitive vip ocf:heartbeat:IPaddr \ params ip="192.168.18.200" nic="eth0" cidr_netmask="24" property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" crm(live)configure# verify crm(live)configure# commit片刻之后,集群就會在目前仍在運行中的節點node2上啟動此資源了,如下所示:
[root@node2 ~]# crm status Cannot change active directory to /var/lib/pacemaker/cores/root: No such file or directory (2) Last updated: Thu Aug 15 14:38:23 2013 Last change: Thu Aug 15 14:37:08 2013 via cibadmin on node2.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition WITHOUT quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ node2.test.com ] OFFLINE: [ node1.test.com ]vip (ocf::heartbeat:IPaddr): Started node2.test.com好了,驗正完成后,我們正常啟動node1.test.com
[root@node2 ~]# ssh node1 "service corosync start" Starting Corosync Cluster Engine (corosync): [確定] [root@node2 ~]# crm status Cannot change active directory to /var/lib/pacemaker/cores/root: No such file or directory (2) Last updated: Thu Aug 15 14:39:45 2013 Last change: Thu Aug 15 14:37:08 2013 via cibadmin on node2.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 1 Resources configured. Online: [ node1.test.com node2.test.com ]vip (ocf::heartbeat:IPaddr): Started node2.test.com [root@node2 ~]#正常啟動node1.test.com后,集群資源vip很可能會重新從node2.test.com轉移回node1.test.com,但也可能不回去。資源的這種在節點間每一次的來回流動都會造成那段時間內其無法正常被訪問,所以,我們有時候需要在資源因為節點故障轉移到其它節點后,即便原來的節點恢復正常也禁止資源再次流轉回來。這可以通過定義資源的黏性(stickiness)來實現。在創建資源時或在創建資源后,都可以指定指定資源黏性。好了,下面我們來簡單回憶一下,資源黏性。
(7).資源黏性
資源黏性是指:資源更傾向于運行在哪個節點。
資源黏性值范圍及其作用:
0:這是默認選項。資源放置在系統中的最適合位置。這意味著當負載能力“較好”或較差的節點變得可用時才轉移資源。此選項的作用基本等同于自動故障回復,只是資源可能會轉移到非之前活動的節點上;
大于0:資源更愿意留在當前位置,但是如果有更合適的節點可用時會移動。值越高表示資源越愿意留在當前位置;
小于0:資源更愿意移離當前位置。絕對值越高表示資源越愿意離開當前位置;
INFINITY:如果不是因節點不適合運行資源(節點關機、節點待機、達到migration-threshold 或配置更改)而強制資源轉移,資源總是留在當前位置。此選項的作用幾乎等同于完全禁用自動故障回復;
-INFINITY:資源總是移離當前位置;
我們這里可以通過以下方式為資源指定默認黏性值: rsc_defaults resource-stickiness=100
crm(live)configure# rsc_defaults resource-stickiness=100 crm(live)configure# verify crm(live)configure# show node node1.test.com node node2.test.com primitive vip ocf:heartbeat:IPaddr \ params ip="192.168.18.200" nic="eth0" cidr_netmask="24" property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm(live)configure# commit(8).結合上面已經配置好的IP地址資源,將此集群配置成為一個active/passive模型的web(httpd)服務集群
Node1: ?
Node2: ?
測試一下:
node1:
node2:
而后在各節點手動啟動httpd服務,并確認其可以正常提供服務。接著使用下面的命令停止httpd服務,并確保其不會自動啟動(在兩個節點各執行一遍):
node1: ?
node2:
[root@node2~]# /etc/init.d/httpd stop [root@node2 ~]# chkconfig httpd off接下來我們將此httpd服務添加為集群資源。將httpd添加為集群資源有兩處資源代理可用:lsb和ocf:heartbeat,為了簡單起見,我們這里使用lsb類型:
首先可以使用如下命令查看lsb類型的httpd資源的語法格式:
crm(live)# ra crm(live)ra# info lsb:httpd start and stop Apache HTTP Server (lsb:httpd) The Apache HTTP Server is an efficient and extensible \ server implementing the current HTTP standards. Operations' defaults (advisory minimum):start timeout=15 stop timeout=15 status timeout=15 restart timeout=15 force-reload timeout=15 monitor timeout=15 interval=15接下來新建資源httpd: ?
來查看一下資源狀態
[root@node1 ~]# crm status Cannot change active directory to /var/lib/pacemaker/cores/root: No such file or directory (2) Last updated: Thu Aug 15 14:55:04 2013 Last change: Thu Aug 15 14:54:14 2013 via cibadmin on node1.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 2 Resources configured. Online: [ node1.test.com node2.test.com ]vip (ocf::heartbeat:IPaddr): Started node2.test.com httpd (lsb:httpd): Started node1.test.com從上面的信息中可以看出vip和httpd有可能會分別運行于兩個節點上,這對于通過此IP提供Web服務的應用來說是不成立的,即此兩者資源必須同時運行在某節點上。有兩種方法可以解決,一種是定義組資源,將vip與httpd同時加入一個組中,可以實現將資源運行在同節點上,另一種是定義資源約束可實現將資源運行在同一節點上。我們先來說每一種方法,定義組資源。
(9).定義組資源
crm(live)# configure crm(live)configure# group webservice vip httpd crm(live)configure# show node node1.test.com node node2.test.com primitive httpd lsb:httpd primitive vip ocf:heartbeat:IPaddr \ params ip="192.168.18.200" nic="eth0" cidr_netmask="24" group webservice vip httpd property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm(live)configure# verify crm(live)configure# commit再次查看一下資源狀態
[root@node1 ~]# crm status Cannot change active directory to /var/lib/pacemaker/cores/root: No such file or directory (2) Last updated: Thu Aug 15 15:33:09 2013 Last change: Thu Aug 15 15:32:28 2013 via cibadmin on node1.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 2 Resources configured. Online: [ node1.test.com node2.test.com ]Resource Group: webservice vip (ocf::heartbeat:IPaddr): Started node2.test.com httpd (lsb:httpd): Started node2.test.com大家可以看到,所有資源全部運行在node2上,下面我們來測試一下
下面我們模擬故障,測試一下
crm(live)# node crm(live)node# ? cd end help online show status-attr attribute clearstate exit list quit standby up bye delete fence maintenance ready status utilization crm(live)node# standby [root@node1 ~]# crm status Cannot change active directory to /var/lib/pacemaker/cores/root: No such file or directory (2) Last updated: Thu Aug 15 15:39:05 2013 Last change: Thu Aug 15 15:38:57 2013 via crm_attribute on node2.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 2 Resources configured. Node node2.test.com: standby Online: [ node1.test.com ]Resource Group: webservice vip (ocf::heartbeat:IPaddr): Started node1.test.com httpd (lsb:httpd): Started node1.test.com [root@node1 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:0C:29:91:45:90inet addr:192.168.18.201 Bcast:192.168.18.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe91:4590/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:408780 errors:0 dropped:0 overruns:0 frame:0 TX packets:323137 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:60432533 (57.6 MiB) TX bytes:57541647 (54.8 MiB) eth0:0 Link encap:Ethernet HWaddr 00:0C:29:91:45:90inet addr:192.168.18.200 Bcast:192.168.18.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 lo Link encap:Local Loopbackinet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:6525 errors:0 dropped:0 overruns:0 frame:0 TX packets:6525 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:882555 (861.8 KiB) TX bytes:882555 (861.8 KiB) [root@node1 ~]# netstat -ntulp | grep :80 tcp 0 0 :::80 :::* LISTEN 16603/httpd大家可以看到時,當node2節點設置為standby時,所有資源全部切換到node1上,下面我們再來訪問一下Web頁面
好了,組資源的定義與說明,我們就先演示到這,下面我們來說一說怎么定義資源約束。
(10).定義資源約束
我們先讓node2上線,再刪除組資源
crm(live)node# online [root@node1 ~]# crm_mon Last updated: Thu Aug 15 15:48:38 2013 Last change: Thu Aug 15 15:46:21 2013 via crm_attribute on node2.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 2 Resources configured. Online: [ node1.test.com node2.test.com ]Resource Group: webservice vip (ocf::heartbeat:IPaddr): Started node1.test.com httpd (lsb:httpd): Started node1.test.com刪除組資源操作
crm(live)# resource crm(live)resource# show Resource Group: webservice vip (ocf::heartbeat:IPaddr): Started httpd (lsb:httpd): Started crm(live)resource# stop webservice #停止資源 crm(live)resource# show Resource Group: webservice vip (ocf::heartbeat:IPaddr): Stopped httpd (lsb:httpd): Stopped crm(live)resource# cleanup webservice #清理資源 Cleaning up vip on node1.test.com Cleaning up vip on node2.test.com Cleaning up httpd on node1.test.com Cleaning up httpd on node2.test.com Waiting for 1 replies from the CRMd. OK crm(live)# configure crm(live)configure# delete cib-bootstrap-options node1.test.com rsc-options webservice httpd node2.test.com vip crm(live)configure# delete webservice #刪除組資源 crm(live)configure# show node node1.test.com node node2.test.com \ attributes standby="off" primitive httpd lsb:httpd primitive vip ocf:heartbeat:IPaddr \ params ip="192.168.18.200" nic="eth0" cidr_netmask="24" property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1376553277" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm(live)configure# commit [root@node1 ~]# crm_mon Last updated: Thu Aug 15 15:56:59 2013 Last change: Thu Aug 15 15:56:12 2013 via cibadmin on node1.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 2 Resources configured. Online: [ node1.test.com node2.test.com ] vip (ocf::heartbeat:IPaddr): Started node1.test.com httpd (lsb:httpd): Started node2.test.com大家可以看到資源又重新運行在兩個節點上了,下面我們來定義約束!使資源運行在同一節點上。首先我們來回憶一下資源約束的相關知識,資源約束則用以指定在哪些群集節點上運行資源,以何種順序裝載資源,以及特定資源依賴于哪些其它資源。pacemaker共給我們提供了三種資源約束方法:
Resource Location(資源位置):定義資源可以、不可以或盡可能在哪些節點上運行;
Resource Collocation(資源排列):排列約束用以定義集群資源可以或不可以在某個節點上同時運行;
Resource Order(資源順序):順序約束定義集群資源在節點上啟動的順序;
定義約束時,還需要指定分數。各種分數是集群工作方式的重要組成部分。其實,從遷移資源到決定在已降級集群中停止哪些資源的整個過程是通過以某種方式修改分數來實現的。分數按每個資源來計算,資源分數為負的任何節點都無法運行該資源。在計算出資源分數后,集群選擇分數最高的節點。INFINITY(無窮大)目前定義為 1,000,000。加減無窮大遵循以下3個基本規則:
任何值 + 無窮大 = 無窮大
任何值 - 無窮大 = -無窮大
無窮大 - 無窮大 = -無窮大
定義資源約束時,也可以指定每個約束的分數。分數表示指派給此資源約束的值。分數較高的約束先應用,分數較低的約束后應用。通過使用不同的分數為既定資源創建更多位置約束,可以指定資源要故障轉移至的目標節點的順序。因此,對于前述的vip和httpd可能會運行于不同節點的問題,可以通過以下命令來解決:
crm(live)configure# colocation httpd-with-ip INFUNTY: httpd vip crm(live)configure# show node node1.test.com node node2.test.com \ attributes standby="off" primitive httpd lsb:httpd primitive vip ocf:heartbeat:IPaddr \ params ip="192.168.18.200" nic="eth0" cidr_netmask="24" colocation httpd-with-ip INFUNTY: httpd vip property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1376553277" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm(live)configure# show xml<rsc_colocation id="httpd-with-ip" score-attribute="INFUNTY" rsc="httpd" with-rsc="vip"/> [root@node2 ~]# crm_mon Last updated: Thu Aug 15 16:12:18 2013 Last change: Thu Aug 15 16:12:05 2013 via cibadmin on node1.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 2 Resources configured. Online: [ node1.test.com node2.test.com ] vip (ocf::heartbeat:IPaddr): Started node1.test.com httpd (lsb:httpd): Started node1.test.com大家可以看到,所有資源全部運行在node1上,下面我們來測試訪問一下
模擬一下故障,再進行測試
crm(live)# node crm(live)node# standby crm(live)node# show node1.test.com: normal standby: on node2.test.com: normal standby: off [root@node2 ~]# crm_mon Last updated: Thu Aug 15 16:14:33 2013 Last change: Thu Aug 15 16:14:23 2013 via crm_attribute on node1.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 2 Resources configured. Node node1.test.com: standby Online: [ node2.test.com ] vip (ocf::heartbeat:IPaddr): Started node2.test.com httpd (lsb:httpd): Started node2.test.com大家可以看到,資源全部移動到node2上了,再進行測試
接著,我們還得確保httpd在某節點啟動之前得先啟動vip,這可以使用如下命令實現:
crm(live)# configure crm(live)configure# order httpd-after-vip mandatory: vip httpd crm(live)configure# verify crm(live)configure# show node node1.test.com \ attributes standby="on" node node2.test.com \ attributes standby="off" primitive httpd lsb:httpd \ meta target-role="Started" primitive vip ocf:heartbeat:IPaddr \ params ip="192.168.18.200" nic="eth0" cidr_netmask="24" \ meta target-role="Started" colocation httpd-with-ip INFUNTY: httpd vip order httpd-after-vip inf: vip httpd property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1376554276" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm(live)configure# show xml<rsc_order id="httpd-after-vip" score="INFINITY" first="vip" then="httpd"/> crm(live)configure# commit此外,由于HA集群本身并不強制每個節點的性能相同或相近。所以,某些時候我們可能希望在正常時服務總能在某個性能較強的節點上運行,這可以通過位置約束來實現:
crm(live)configure# location prefer-node1 vip node_pref::200: node1
好了,到這里高可用的Web集群的基本配置全部完成,下面我們來講一下增加nfs資源。
8.crmsh 配置nfs資源
(1).配置NFS服務器
[root@nfs ~]# mkdir -pv /web mkdir: 已創建目錄 “/web” [root@nfs ~]# vim /etc/exports /web/192.168.18.0/24(ro,async) [root@nfs /]# echo '<h1>Cluster NFS Server</h1>' > /web/index.html [root@nfs ~]# /etc/init.d/rpcbind start 啟動 rpcbind : [確定] [root@nfs /]# /etc/init.d/nfs start 啟動 NFS 服務: [確定] 關掉 NFS 配額: [確定] 啟動 NFS 守護進程: [確定] 啟動 NFS mountd: [確定] [root@nfs /]# showmount -e 192.168.18.208 Export list for192.168.18.208: /web192.168.18.0/24(2).節點測試掛載
node1:
[root@node1 ~]# mount -t nfs 192.168.18.208:/web /mnt [root@node1 ~]# cd /mnt/ [root@node1 mnt]# ll 總計 4 -rw-r--r-- 1 root root 28 08-07 17:41 index.html [root@node1 mnt]# mount /dev/sda2on / typeext3 (rw) proc on /proctypeproc (rw) sysfs on /systypesysfs (rw) devpts on /dev/ptstypedevpts (rw,gid=5,mode=620) /dev/sda3on /datatypeext3 (rw) /dev/sda1on /boottypeext3 (rw) tmpfs on /dev/shmtypetmpfs (rw) none on /proc/sys/fs/binfmt_misctypebinfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefstyperpc_pipefs (rw) 192.168.18.208:/webon /mnttypenfs (rw,addr=192.168.18.208) [root@node1 ~]# umount /mnt [root@node1 ~]# mount /dev/sda2on / typeext3 (rw) proc on /proctypeproc (rw) sysfs on /systypesysfs (rw) devpts on /dev/ptstypedevpts (rw,gid=5,mode=620) /dev/sda3on /datatypeext3 (rw) /dev/sda1on /boottypeext3 (rw) tmpfs on /dev/shmtypetmpfs (rw) none on /proc/sys/fs/binfmt_misctypebinfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefstyperpc_pipefs (rw)node2:
[root@node2 ~]# mount -t nfs 192.168.18.208:/web /mnt [root@node2 ~]# cd /mnt [root@node2 mnt]# ll 總計 4 -rw-r--r-- 1 root root 28 08-07 17:41 index.html [root@node2 mnt]# mount /dev/sda2on / typeext3 (rw) proc on /proctypeproc (rw) sysfs on /systypesysfs (rw) devpts on /dev/ptstypedevpts (rw,gid=5,mode=620) /dev/sda3on /datatypeext3 (rw) /dev/sda1on /boottypeext3 (rw) tmpfs on /dev/shmtypetmpfs (rw) none on /proc/sys/fs/binfmt_misctypebinfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefstyperpc_pipefs (rw) 192.168.18.208:/webon /mnttypenfs (rw,addr=192.168.18.208) [root@node2 ~]# umount /mnt [root@node2 ~]# mount /dev/sda2on / typeext3 (rw) proc on /proctypeproc (rw) sysfs on /systypesysfs (rw) devpts on /dev/ptstypedevpts (rw,gid=5,mode=620) /dev/sda3on /datatypeext3 (rw) /dev/sda1on /boottypeext3 (rw) tmpfs on /dev/shmtypetmpfs (rw) none on /proc/sys/fs/binfmt_misctypebinfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefstyperpc_pipefs (rw)(3).配置資源 vip 、httpd、nfs
crm(live)# configure crm(live)configure# show node node1.test.com node node2.test.com property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1376555949" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=192.168.18.200 nic=eth0 cidr_netmask=24 crm(live)configure# primitive httpd lsb:httpd crm(live)configure# primitive nfs ocf:heartbeat:Filesystem params device=192.168.18.208:/web directory=/var/www/html fstype=nfs crm(live)configure# verify WARNING: nfs: default timeout 20s for start is smaller than the advised 60 WARNING: nfs: default timeout 20s for stop is smaller than the advised 60 crm(live)configure# show node node1.test.com node node2.test.com primitive httpd lsb:httpd primitive nfs ocf:heartbeat:Filesystem \ params device="192.168.18.208:/web" directory="/var/www/html" fstype="nfs" primitive vip ocf:heartbeat:IPaddr \ params ip="192.168.18.200" nic="eth0" cidr_netmask="24" property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1376555949" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm(live)configure# commit WARNING: nfs: default timeout 20s for start is smaller than the advised 60 WARNING: nfs: default timeout 20s for stop is smaller than the advised 60查看一下定義的三個資源,大家可以看到三個資源不在同一個節點上,下面我們定義一下組資源,來使三個資源在同一節點上。
[root@node2 ~]# crm_mon Last updated: Thu Aug 15 17:00:17 2013 Last change: Thu Aug 15 16:58:44 2013 via cibadmin on node1.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 3 Resources configured. Online: [ node1.test.com node2.test.com ] vip (ocf::heartbeat:IPaddr): Started node1.test.com httpd (lsb:httpd): Started node2.test.com nfs (ocf::heartbeat:Filesystem): Started node1.test.com(4).定義組資源
crm(live)# configure crm(live)configure# group webservice vip nfs httpd crm(live)configure# verify crm(live)configure# show node node1.test.com node node2.test.com primitive httpd lsb:httpd primitive nfs ocf:heartbeat:Filesystem \ params device="192.168.18.208:/web" directory="/var/www/html" fstype="nfs" primitive vip ocf:heartbeat:IPaddr \ params ip="192.168.18.200" nic="eth0" cidr_netmask="24" group webservice vip nfs httpd property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1376555949" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm(live)configure# commit查看一下資源狀態,所有資源全部在node1上,下面我們測試一下
[root@node2 ~]# crm_mon Last updated: Thu Aug 15 17:03:20 2013 Last change: Thu Aug 15 17:02:44 2013 via cibadmin on node1.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 3 Resources configured. Online: [ node1.test.com node2.test.com ]Resource Group: webservice vip (ocf::heartbeat:IPaddr): Started node1.test.com nfs (ocf::heartbeat:Filesystem): Started node1.test.com httpd (lsb:httpd): Started node1.test.com(5).最后我們模擬一下資源故障
crm(live)# node crm(live)node# standby crm(live)node# show node1.test.com: normal standby: on node2.test.com: normal [root@node2 ~]# crm_mon Last updated: Thu Aug 15 17:05:52 2013 Last change: Thu Aug 15 17:05:42 2013 via crm_attribute on node1.test.com Stack: classic openais (with plugin) Current DC: node2.test.com - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 3 Resources configured. Node node1.test.com: standby Online: [ node2.test.com ]Resource Group: webservice vip (ocf::heartbeat:IPaddr): Started node2.test.com nfs (ocf::heartbeat:Filesystem): Started node2.test.com httpd (lsb:httpd): Started node2.test.com當node1故障時,所有資源全部移動到時node2上,下面我們再來訪問一下吧
大家可以看到,照樣能訪問,好了今天的博客就到這邊,在下一篇博客中我們將重點講解DRDB知識。^_^……
轉載于:https://blog.51cto.com/freeloda/1274533
總結
以上是生活随笔為你收集整理的Linux 高可用(HA)集群之Pacemaker详解的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: HDU2699+Easy
- 下一篇: JQuery UI – droppabl