Oracle Coherence运维监控
1. 環境參數檢查與設置環境參數檢查與設置
具體請參考Oracle? Coherence Administrator's Guide的第6章:Performance Tuning。針對本次項目的AIX環境,建議調整下面這些參數:
1.1. AIX操作系統參數
1.1.1.???SocketBuffer Sizes
默認的socket buffer sizes一般都比較小,Coherence會報下面的Warning:
UnicastUdpSocket failed to set receive buffer size to1428 packets (2096304
bytes); actual size is 89 packets (131071 bytes).Consult your OS documentation
regarding increasing the maximum socket buffer size.Proceeding with the actual
value may cause sub-optimal performance.
?
用root用戶執行下面的命令進行調整:
no -o rfc1323=1
no -o sb_max=4194304
1.1.2.???多播與IPV6選項
AIX5.2以上版本缺省以IPV6進行多播,需要在啟動Coherence服務與應用時候,在JVM使用以下系統屬性確認使用IPV4
-D java.net.preferIPv4Stack = true
同時在/etc/netsvc.conf中hosts=local,bind4
?
1.2. IBM JVM特殊配置
1.2.1.???OutOfMemoryError
如果某個節點處于OutOfMemoryError狀態,會給集群帶來不好的影響,所以當某個節點處于這種狀態,應該讓它退出而不是師徒恢復。所以需要在IBM JVM的啟動參數中配置:
UNIX:
-Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,exec="kill-9 %pid"
1.2.2.???HeapSizing
IBM JVM不建議采用固定大小的heap,所以建議只配置-Xms,不配置-Xmx,具體可參考:http://www.ibm.com/developerworks/java/jdk/diagnosis/
?
?
?
2. 啟停腳本
2.1. 啟動腳本
2.2. 數據加載腳本
2.3. 停止腳本
?
3. Coherence日志管理
3.1. 日志說明
Coherence有它自己的日志框架,同時還支持使用log4j,SLF4J以及Javalogging ,為應用程序提供一個通用的日志環境。Coherence的日志是一個專用和低優先級線程,以降低日志記錄對系統的關鍵部分的影響。日志被預先配置,并根據需要將默認設置進行修改。
Coherence記錄日志級別決定了日志消息發出。默認的日志級別發出的錯誤,警告,信息,以及一些調試消息。在開發過程中,日志級別應提高到其最大設置,以確保所有調試消息記錄。生產環境的日志輸出級別3是合理的,在開發環境下,日志級別越高,輸出信息越詳細,默認值為5. 以下日志級別說明:
·?????????????0 – Thislevel includes messages that are not associated with a logging level.與日志級別沒有關系的信息
·?????????????1 – Thislevel includes the previous level's messages plus error messages.錯誤日志
·?????????????2 – Thislevel includes the previous levels' messages plus warning messages.警告日志
·?????????????3 – Thislevel includes the previous levels' messages plus informational messages.
·?????????????4-9 – Theselevels include the previous levels' messages plus internal debugging messages.More log messages are emitted as the log level is increased. The default loglevel is5. debug的信息
·?????????????-1 – Nolog messages are emitted.無日志輸出
3.2. 日志級別設置
Coherence的日志級別可以在tangosol-coherence-override.xml文件中配置,如下說示:
? <logging-config>
??????? <destinationsystem-property="tangosol.coherence.log">log4j</destination>
??????? <severity-levelsystem-property="tangosol.coherence.log.level">3</severity-level>
??? </logging-config>
3.3. 日志監控
如果Coherence的日志文件或者應用的日志文件比較多或者比較大,要及時清理,防止把磁盤空間耗光。需要定期檢查Coherence的日志,要注意警告warning及以上級別的日志信息,特別要注意的下面這些問題:
1、 Un-indexed data access 無索引的數據訪問 日志關注的內容
1)? at com.tangosol...readSerializable(ExternalizableHelper.java:2180
2)? YYYY-MM-DD HH:MM:SS.mmm/55.838 Oracle Coherence GE 12.1.2.0.0<…> . . .Timeout while delivering a packet; requestingthe departure confirmation for Member(. . . ) by MemberSet(. . . )
?
2、 Heap exhaustion 內存消耗 日志關注的內容
java.lang.OutOfMemoryError: GC overhead limit exceeded Dumpingheap to java_pid6199.hprof. . .
Heap dump file created [16864871 bytes in 1.921 secs]
????????????????????????????????????????????????????????????????
3、 Unresponsive service 未響應的服務
(thread=Cluster, member=2): Detected soft timeout) of {WrapperGuardableGuard{Daemon=DistributedCache}
4、 有關SWAP 的消息
2013/09/17 10:20:26 | [GC 938176K->865107K(1021376K), 19.7179554secs]
5、 Potential Bandwidth Messages? 潛在的帶寬的消息
a)?Experienceda XXX ms communication delay (probable remote GC) with MemberYYY
b)?Apotential communication problem has been detected.
c)?Thisnode appears to have become disconnected
6、 Potential Disconnect Messages 潛在斷開消息
a)??(thread=Cluster,member=5): Failed to reach address /192.168.1.103within the IpMonitor timeout. Members [Member(Id=3. . . )] are suspect.
b)??(thread=Cluster,member=5): Timed-out members MemberSet(Size=4,BitSetCount=2Member(Id=1, Timestamp=2011-02-05
7、 Detecting Split Brain 集群腦裂的信息
a)??2013-01-2508:16:59.555/638.831 Oracle Coherence GE 12.1.2.0.0/465p4 <D5>Anexistence of a cluster island
b)??2010-01-2509:38:43.213/460.877 Oracle Coherence GE 12.1.2.0.0/465p4Receivedpanic from senior Member,. . .
?
4. Coherence集群監控
4.1. Coherence集群監控說明
有多種工具可以監控Coherence集群,主要有:
1.? Using JMX to Manage Oracle Coherence
JMX工具,主要是指Jconsole或者Java VisualVM.
2.? Using Oracle Coherence Reporting
Coherence本身提供的功能,可生產文本格式的統計報告。
3.? Using Oracle WebLogic Server
可通過Weblogic Console監控Coherence節點的健康狀態,并啟停Coherence節點。
4.? Using Oracle Enterprise Manager
也就是通過OEM的ManagementPack for Oracle Coherence,具體請參見:https://docs.oracle.com/cd/E24628_01/install.121/e24215/coherence_getstarted.htm
?
如果是通過JXM工具監控,需要修改Coherence啟動腳本,加上下面的參數:
-Dcom.sun.management.jmxremote-Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true
?
如果需要遠程監控:還需要加上:
-Dcom.sun.management.jmxremote.host=10.46.158.140-Dcom.sun.management.jmxremote.port=7091-Dcom.sun.management.jmxremote.ssl=false-Dcom.sun.management.jmxremote.authenticate=false
?
如果連接不上,還要加上
-Dcom.sun.management.jmxremote.local.only=false
?
為減少對集群性能的影響,一個集群中,只要有一個節點配置了上面的JMX參數就可以了。不需要每個節點都配置.
?
JMX工具只能監控從JMX工具啟動到停止這個階段的Coherence集群情況,而通過OEM監控,則可以把采集到的監控數據保存到數據庫中,可以查看歷史情況。
對Coherence的監控,重點是對內存的監控,如果發現內存沒有及時回收并且即將耗光,可進行手工GC, Jconsole或者java VisualVM都可以手工GC,見下面的介紹。
4.2. 通過Java VisualVM監控
4.2.1.???安裝Coherence插件
?
?
?
?
?
?
4.2.2.???Coherence集群的Machine狀態監控
4.2.3.???Coherence集群的成員監控
要注意publisher success rate和receiver success rate, send Q size等指標,并注意每個節點的內存是否足夠。Free memory等指標
4.2.4.???Coherence集群的Service監控
要注意是不是所有的Service都處于正常狀態,并注意task average duration, request average duration是否正常。Task backlog是否為0
?
如下面的Service狀態就不正常,處于ENDANGERED狀態, request average duration值也特別高。
4.2.5.???Coherence集群的Cache監控
4.2.6.???Coherence節點CPU,內存監控
如下圖所示,VisualVM可監控到具體某個節點的CPU,內存使用情況,并且可以進行手工GC.
4.3. 通過JConsole監控
JConsole可監控具體某個Coherence節點的CPU,內存,進程情況,并可通過Jconsole手工執行GC。
另外通過JConsole的MBean可以監控更多細節的東西,這是JConsole比VisualVM強的地方。
4.4. 通過JMX編程監控
通過jmx管理Coherence,通過MBean數據可以顯示Coherence集群簡明的操作信息,實現實時的監控和分析。用Coherence-JVisualVM插件可以得到很多的Coherence相關信息,比如:Coherence集群的Machines,Members,Services,Caches等相關信息。
Coherence的MBean列表如下:
| CacheMBean | Represents a cache. A cluster member includes zero or more instances of this managed bean. |
| ClusterMBean | Represents a cluster. Each cluster member includes a single instance of this managed bean. |
| ClusterNodeMBean | Represents a cluster member. Each cluster member includes a single instance of this managed bean. |
| ConnectionManagerMBean | Represents an Oracle Coherence*Extend proxy. A cluster member includes zero or more instances of this managed bean. |
| ConnectionMBean | Represents a remote client connection through Oracle Coherence*Extend. A cluster member includes zero or more instances of this managed bean. |
| FlashJournalRM | Represents a flash journal resource manager. The managed bean is an instance of the JournalMBean interface. Each cluster member includes a single instance of this managed bean. |
| ManagementMBean | Represents the grid JMX infrastructure. Each cluster member includes a single instance of this managed bean. |
| PointToPointMBean | Represents the network status between two cluster members. Each cluster member includes a single instance of this managed bean. |
| RamJournalRM | Represents a RAM journal resource manager. The managed bean is an instance of the JournalMBean interface. Each cluster member includes a single instance of this managed bean. |
| ReporterMBean | Represents the Oracle Coherence reporter. Each cluster member includes a single instance of this managed bean. |
| ServiceMBean | Represents a clustered service. A cluster member includes zero or more instances of this managed bean. |
| StorageManagerMBean | Represents a storage instance for a storage-enabled distributed cache service. A cluster member includes zero or more instances of this managed bean. |
| TransactionManagerMBean | Represents a transaction manager. A cluster member includes zero or more instances of this managed bean. |
總結
以上是生活随笔為你收集整理的Oracle Coherence运维监控的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: lsp语言服务器,身为程序员还不知道?X
- 下一篇: 3.算法的渐进分析