ASM心跳超时检测之--Delayed ASM PST heart beats
??????? 近日,連續收到ASM磁盤dismount,并且是錯誤“Waited 15 secs for write IO to PST”的問題,這是ASM特有的心跳超時檢測,ASM instance會定期檢查每個asm disk是不是能正常反饋。所以決定針對這個問題,做個小總結。?
?????? 在文檔ASM diskgroup dismount with "Waited 15 secs for write IO to PST" (Doc ID 1581684.1) 中有下面一段描述:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generally this kind messages comes in ASM alertlog file on below situations,
Delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,
thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.
By the way the heart beat delays are sort of ignored for external redundancy diskgroup.
ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,
but the heart beat delays do not dismount external redundancy diskgroup directly.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
上面描述,可以理解為下面幾點:
1. ASM實例會定期檢查每一個磁盤組的磁盤狀態,是否通信正常;
2. 這個檢查,只是針對normal和high冗余模式,對于external冗余,不會遇到這個錯誤;
3. 默認情況是15s超時,也就是說15s磁盤組還是沒有對ASM實例響應的話,就會dismount磁盤組。
??????? 而遇到這個問題的客戶,都是使用光纖網絡存儲,在存儲網絡出現問題的情況下,會引發這個錯誤的出現。也就是說,在ASM定期發出檢查信息的時候,如果磁盤沒有在15s內反饋的話,我就認為磁盤已經無法訪問。
??????? 針對這個錯誤,我嘗試在測試環境測試,由于測試環境是VMware的虛擬機,在物理層面刪除磁盤,并不會引發這個問題。原因是在同一個主機上的磁盤被異常刪除后,ASM的讀取操作會立即返回系統層面的IO錯誤,而不需要去等待錯誤“Waited 15 secs for write IO to PST”的超時。
?????? 所以,我總結這個錯誤,只會出現在共享的ASM磁盤,不在物理主機的本地,而是在存儲網絡中,ASM發出去的檢測信息,不能及時被反饋,才會出現這個錯誤。這時,可能是存儲主機,存儲網絡,甚至存儲磁盤的問題,anyway,我ASM沒有收到我需要的確認信息,我認為你有問題,如果有問題的磁盤數夠多,達到影響數據完整性了,那我ASM就要dismount這個磁盤組了。
??????? 這里對于“Waited 15 secs for write IO to PST”錯誤信息,根據文檔1581684.1介紹,是在11.2.0.3.0之后出現的。同時在文檔中有描述,如何手動修改這個檢測超時的時間,可以通過參數_asm_hbeatiowait來控制:
alter system set "_asm_hbeatiowait"=<value> scope=spfile sid='*';
<需要重啟ASM/CRS來時修改生效。>
為了確認,這個參數是在11.2.0.3之后出現的,我將全部數據庫版本都查詢一遍,具體可以參考下面信息:
======================10.2===================== SQL>?select?*?from?v$version; BANNER ---------------------------------------------------------------- Oracle?Database?10g?Enterprise?Edition?Release?10.2.0.5.0?-?Prod PL/SQL?Release?10.2.0.5.0?-?Production CORE?10.2.0.5.0?Production TNS?for?Linux:?Version?10.2.0.5.0?-?Production NLSRTL?Version?10.2.0.5.0?-?ProductionSQL>?select?ksppinm?as?"hidden?parameter",?ksppstvl?as?"value"?from?x$ksppi?join?x$ksppcv?using?(indx)?where?ksppinm?like?'\_%'?escape?'\'?and?ksppinm?like?'%undo%'?order?by?ksppinm; hidden?parameter?value --------------------------------------------------------------------------------?---------- _asm_acd_chunks?1 _asm_allow_only_raw_disks?TRUE _asm_allow_resilver_corruption?FALSE _asm_ausize?1048576 _asm_blksize?4096 _asm_direct_con_expire_time?120 _asm_disk_repair_time?14400 _asm_droptimeout?60 _asm_emulmax?10000 _asm_emultimeout?0 _asm_fob_tac_frequency?3 hidden?parameter?value --------------------------------------------------------------------------------?---------- _asm_instlock_quota?0 _asm_kfdpevent?0 _asm_libraries?ufs _asm_maxio?1048576 _asm_skip_resize_check?FALSE _asm_stripesize?131072 _asm_stripewidth?8 _asm_wait_time?18 _asmlib_test?0 _asmsid?asm 21?rows?selected.======================11.2.0.1===================== sqlplus?/?as?sysdba Connected?to: Oracle?Database?11g?Enterprise?Edition?Release?11.2.0.1.0?-?64bit?Production With?the?Partitioning,?OLAP,?Data?Mining?and?Real?Application?Testing?options SQL>?select?ksppinm?as?"hidden?parameter",?ksppstvl?as?"value"?from?x$ksppi?join?x$ksppcv?using?(indx)?where?ksppinm?like?'\_%'?escape?'\'?and?ksppinm?like?'%asm_hb%'?order?by?ksppinm; hidden?parameter?value -------------------------------------------------------------------------------- _asm_hbeatwaitquantum?2======================11.2.0.2=====================$?sqlplus?/?as?sysdba Connected?to: Oracle?Database?11g?Enterprise?Edition?Release?11.2.0.2.0?-?64bit?Production With?the?Partitioning,?Oracle?Label?Security,?OLAP,?Data?Mining and?Real?Application?Testing?options SQL>?select?ksppinm?as?"hidden?parameter",?ksppstvl?as?"value"?from?x$ksppi?join?x$ksppcv?using?(indx)?where?ksppinm?like?'\_%'?escape?'\'?and?ksppinm?like?'%asm_hb%'?order?by?ksppinm; hidden?parameter?value -------------------------------------------------------------------------------- _asm_hbeatwaitquantum?2在11.2.0.3.0之后才有這個參數出現,也就是說ASM實例對磁盤超時的檢測是在11.2.0.3之后才出現的 ======================11.2.0.3===================== sys@R11203>?select?*?from?v$version; BANNER -------------------------------------------------------------------------------- Oracle?Database?11g?Enterprise?Edition?Release?11.2.0.3.0?-?64bit?Production SQL>?select?ksppinm?as?"hidden?parameter",?ksppstvl?as?"value"?from?x$ksppi?join?x$ksppcv?using?(indx)?where?ksppinm?like?'\_%'?escape?'\'?and?ksppinm?like?'%undo%'?order?by?ksppinm; hidden?parameter?value hidden?parameter?value --------------------------------------------------?-------------------- _asm_hbeatiowait?15 _asm_hbeatwaitquantum?2======================11.2.0.4===================== SQL>?select?*?from?v$version; BANNER -------------------------------------------------------------------------------- Oracle?Database?11g?Enterprise?Edition?Release?11.2.0.4.0?-?Production SQL>?select?ksppinm?as?"hidden?parameter",?ksppstvl?as?"value"?from?x$ksppi?join?x$ksppcv?using?(indx)?where?ksppinm?like?'\_%'?escape?'\'?and?ksppinm?like?'%undo%'?order?by?ksppinm; hidden?parameter?value --------------------------------------------------------------------------------?--------- _asm_hbeatiowait?15?<<<<<<<<<<<<<<<<<<<< _asm_hbeatwaitquantum?2======================12.1.0.1=====================$?sqlplus?/?as?sysdba Connected?to: Oracle?Database?12c?Enterprise?Edition?Release?12.1.0.1.0?-?64bit?Production With?the?Partitioning,?OLAP,?Advanced?Analytics?and?Real?Application?Testing?options SQL>?select?ksppinm?as?"hidden?parameter",?ksppstvl?as?"value"?from?x$ksppi?join?x$ksppcv?using?(indx)?where?ksppinm?like?'\_%'?escape?'\'?and?ksppinm?like?'%asm_hb%'?order?by?ksppinm; hidden?parameter?value -------------------------------------------------------------------------------- _asm_hbeatiowait?15 _asm_hbeatwaitquantum?2在12.1.0.2之后,這個參數默認值被調整為120s======================12.1.0.2=====================$?sqlplus?/?as?sysdbaConnected?to: Oracle?Database?12c?Enterprise?Edition?Release?12.1.0.2.0?-?64bit?Production With?the?Partitioning,?OLAP,?Advanced?Analytics?and?Real?Application?Testing?options SQL>?select?ksppinm?as?"hidden?parameter",?ksppstvl?as?"value"?from?x$ksppi?join?x$ksppcv?using?(indx)?where?ksppinm?like?'\_%'?escape?'\'?and?ksppinm?like?'%asm_hb%'?order?by?ksppinm; hidden?parameter?value -------------------------------------------------------------------------------- _asm_hbeatiowait?120 _asm_hbeatwaitquantum?2????? 希望總結的這個知識點,對你有幫助。日常中,經常感嘆,這個問題很簡單,但是不sure,測試過后,記錄下來,以備查詢。
總結
以上是生活随笔為你收集整理的ASM心跳超时检测之--Delayed ASM PST heart beats的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 避免重蹈欧美“超级电厂”覆辙 瑞星全力保
- 下一篇: Notice : Soft open f