oracle gc current request,gc current request 引起长期锁表的故障
gc current request 引起長期鎖表的故障
故障描述:一個數倉系統長期跑批時出現鎖表,鎖表偶爾嚴重時影響整體跑批流程,進而影響第2天業務運行。根據業務請求需要進行優化。優化目標盡量減少鎖表時間(業務人員最低要求不影響整體跑批流程)。
業務人員訴求分析:找出嚴重鎖表原因,解決鎖表嚴重問題不影響整體跑批即可。
DBA解決思路分析:現象=》AWR/ASH/ADDM報告和調用(監控foglight/bmc)等=》結合經驗分析結果=》結合現象+技術原理分析=》最終確定問題原因+解決方案=》長期觀察是否解決問題
1.系統環境
SQL> select * from gv$version;
INST_ID BANNER
---------- --------------------------------------------------------------------------------
2 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
2 PL/SQL Release 11.2.0.4.0 - Production
2 CORE 11.2.0.4.0? ? ? Production
2 TNS for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
2 NLSRTL Version 11.2.0.4.0 - Production
1 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
1 PL/SQL Release 11.2.0.4.0 - Production
1 CORE 11.2.0.4.0? ? ? Production
1 TNS for IBM/AIX RISC System/6000: Version 11.2.0.4.0 - Production
1 NLSRTL Version 11.2.0.4.0 - Production
10 rows selected.
2.故障時間點確認
Troubleshooting:數據說話。
我們這里用foglight(DB監控軟件)來抓取數據(CPU/IO/MEMORY波峰),確定故障時間點。
2.1 以等待事件為基礎數據
7天統計如下:
截圖省略...
9月13號,業務人員報故障的時間段
截圖省略...
==經DBA分析,決定選擇00:00:00--01:00:00和03:00:00--04:00:00進行分析==
3.根據性能分析工具AWR的數據進行分析
00:00:00--01:00:00 AWR報告
3.1 因為是鎖表也是一個種等待事件,所以直接快速關注events
==node A==
Top 10 Foreground Events by Total Wait Time
EventWaitsTotal Wait Time (sec)Wait Avg(ms)% DB timeWait Class
enq: CF - contention3,65113.7K374657.9Other
enq: TX - row lock contention513610.87080015.3Application
direct path read10,6521705.31607.2User I/O
db file scattered read35,5611056.8304.5User I/O
direct path write22,929989.5434.2User I/O
DB CPU 530.4 2.2
direct path read temp18,124270.6151.1User I/O
db file sequential read46,447264.361.1User I/O
enq: HW - contention3,241199.161.8Configuration
gc current block busy3,794164.543.7Cluster
==node B==
Top 10 Foreground Events by Total Wait Time
EventWaitsTotal Wait Time (sec)Wait Avg(ms)% DB timeWait Class
enq: CF - contention7,1146801.195630.9Other
direct path write102,6064393.34319.9User I/O
direct path read28,6704047.914118.4User I/O
db file scattered read91,5642344.52610.6User I/O
DB CPU 1296 5.9
gc current request1383.13830951.7Cluster
db file parallel read3,286284.7871.3User I/O
db file sequential read86,52424731.1User I/O
CSS initialization298222.57471.0Other
control file parallel write31,876148.75.7System I/O
Performance Degradation as a Result of 'enq: CF - contention' (文檔 ID 1072417.1)
Waits for 'Enq: TX - ...' Type Events - Transaction (TX) Lock Example Scenarios (文檔 ID 62354.1)
主要問題:
1.enq: CF - contention 推斷分析:redo log 產生過大且頻繁
2.enq: TX - row lock contention 推斷分析:應用設計有問題:長事務,未批量提交
4.驗證推斷分析的結果
查看等待事件
00:00:00--01:00:00 EVENT TOP 10
INSTANCE_NUMBER EVENT? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?CNT
--------------- ---------------------------------------- ----------
1 enq: CF - contention? ? ? ? ? ? ? ? ? ? ? ? ? ?1338
2 enq: CF - contention? ? ? ? ? ? ? ? ? ? ? ? ? ? 690
2 direct path write? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?382
1 enq: TX - row lock contention? ? ? ? ? ? ? ? ? ?360
2 direct path read? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 321
2 db file scattered read? ? ? ? ? ? ? ? ? ? ? ? ? 251
1 db file scattered read? ? ? ? ? ? ? ? ? ? ? ? ? ?94
1 direct path read? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?65
2 PX Deq: Table Q Get Keys? ? ? ? ? ? ? ? ? ? ? ? ?64
==此時間段,還有事務進行的,只是 redo log 等待嚴重。
00:00:00--08:00:00 EVENT TOP 10
(08:00:00為KILL LOCK TABLE后正常跑批流程的時間點)
INSTANCE_NUMBER EVENT? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?CNT
--------------- ---------------------------------------- ----------
1 enq: TX - row lock contention? ? ? ? ? ? ? ? ? 2878
2 gc current request? ? ? ? ? ? ? ? ? ? ? ? ? ? ?2171
2 direct path read? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?2005
1 direct path read? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1827
1 enq: CF - contention? ? ? ? ? ? ? ? ? ? ? ? ? ?1338
2 Disk file operations I/O? ? ? ? ? ? ? ? ? ? ? ? 900
1 Disk file operations I/O? ? ? ? ? ? ? ? ? ? ? ? 888
2 enq: CF - contention? ? ? ? ? ? ? ? ? ? ? ? ? ? 690
1 db file sequential read? ? ? ? ? ? ? ? ? ? ? ? ?472
9 rows selected.
==此時間段,基本上已經沒有事務進行了,完成被TX和gc current request等待所占據。
00:00:00--01:00:00
==node A==
Load Profile
Per SecondPer TransactionPer ExecPer Call
DB Time(s):6.63.20.400.17
DB CPU(s):0.20.10.010.00
Redo size (bytes):1,669,685.0814,982.5
==node B==
Load Profile
Per SecondPer TransactionPer ExecPer Call
DB Time(s):6.13.10.240.17
DB CPU(s):0.40.20.010.01
Redo size (bytes):2,170,422.31,082,389.6
03:00:00--04:00:00
==node A==
Load Profile
Per SecondPer TransactionPer ExecPer Call
DB Time(s):1.10.60.060.02
DB CPU(s):0.00.00.000.00
Redo size (bytes):329,441.6183,969.1
==node B==
Load Profile
Per SecondPer TransactionPer ExecPer Call
DB Time(s):1.00.60.100.03
DB CPU(s):0.00.00.000.00
Redo size (bytes):1,489.5840.2
00:00:00--01:00:00與03:00:00--04:00:00時間段,每秒產生redo size比值
node A :814,982/183,969 ==4.4
node B :1,082,389/840==1288.5
==根據redo size每秒產生的量,可以推斷出3-4這個時間段與0-1時間段的事務量劇減,可以說沒有事務完成,數據庫完全處于一個HANG的狀態下。
下一步看看是誰HANG了整個數據庫
00:00:00--01:00:00
enq: CF - contention =>
SQL_ID? ? ? ? ? COUNT(1)
------------- ----------
b9zh67r40t24q? ? ? ? 461
9tw8kwwj97zub? ? ? ? 353
c404bhf4rjaum? ? ? ? 204
66pb0fp8jt1c4? ? ? ? 197
5dj472vrff0k6? ? ? ? 127
SQL_TEXT
--------------------------------------------------------------------------------
INSERT /*+append parallel 32*/ INTO TRANSACTION_DETAIL(AGREEMENT_SEQ,CHARGE_CDE,
INSERT /*+append parallel 32*/ INTO SAP_INTERMEDIATE(CONTRACT_TYP,F_ACCOUNT_CD,F
INSERT /*+append parallel 32*/ INTO DC_FILE_DETAIL_RECORD(ACCOUNT_NBR_TRACE,ACCO
INSERT /*+append parallel 32*/ INTO RECEIVABLE_PAID(ACCOUNT_NBR,ACTUAL_REALIZATI
INSERT /*+append parallel 32*/ INTO INCOME_RECOGNITION(AGREEMENT_SEQ,CONTRACT_ID
enq: TX - row lock contention =>
SQL_TEXT
--------------------------------------------------------------------------------
DELETE FROM use_resource_state WHERE resource_addr=:1 AND resource_type=:2 AND r
00:00:00--08:00:00
gc current request =>
SQL_TEXT
--------------------------------------------------------------------------------
insert? /*+append*/? into F_NS_PD_BMC_DETAIL(ADDRESS_ONE,ADDRESS_THREE,ADDRE
==據業務人員所述,當晚kill lock table F_NS_PD_BMC_DETAIL,數據庫就正常運行了
==DBA工具分析也關注到了這張問題表F_NS_PD_BMC_DETAIL,這對象伴隨著gc current request等待事件。
所以這個 gc current request 等待事件值得分析。
接下來分析gc current request產生的原因:
分析思路如下:
A.什么時候開始產生的gc current request等待事件?
B.什么時候結束了gc current request等待事件?
C.這個gc current request等待事件與BUG是否有關系?
A.什么時候開始產生的gc current request等待事件?
gc current request 開始
SQL> select * from (
2? select INSTANCE_NUMBER,event,sql_id,to_char(SAMPLE_TIME,'yyyy-mm-dd hh24:mi:ss')
3? from dba_hist_active_sess_history
4? where to_char(SAMPLE_TIME,'yyyy-mm-dd')='2017-09-13'
5? and event like 'gc current request'? order by SAMPLE_TIME) where rownum<20;
INSTANCE_NUMBER EVENT? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SQL_ID? ? ? ? TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:54:26
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:54:36
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:54:46
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:54:56
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:55:06
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:55:16
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:55:26
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:55:36
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:55:46
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:55:56
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:56:06
INSTANCE_NUMBER EVENT? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SQL_ID? ? ? ? TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:56:16
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:56:26
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:56:36
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:56:46
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:56:56
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:57:06
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:57:16
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 00:57:26
B.什么時候結束了gc current request等待事件?
gc current request 結束
SQL> select * from (
2? select INSTANCE_NUMBER,event,sql_id,to_char(SAMPLE_TIME,'yyyy-mm-dd hh24:mi:ss')
3? from dba_hist_active_sess_history
4? where to_char(SAMPLE_TIME,'yyyy-mm-dd')='2017-09-13'
5? and event like 'gc current request'? order by SAMPLE_TIME desc) where rownum<20;
INSTANCE_NUMBER EVENT? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SQL_ID? ? ? ? TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:56:22
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:56:12
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:56:02
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:55:52
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:55:42
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:55:32
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:55:22
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:55:12
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:55:02
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:54:52
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:54:42
INSTANCE_NUMBER EVENT? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? SQL_ID? ? ? ? TO_CHAR(SAMPLE_TIME
--------------- ---------------------------------------- ------------- -------------------
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:54:32
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:54:22
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:54:12
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:54:02
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:53:52
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:53:42
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:53:32
2 gc current request? ? ? ? ? ? ? ? ? ? ? ?fz486aazdcx0h 2017-09-13 06:53:22
==SQL_ID='fz486aazdcx0h' 從 2017-09-13 00:54:26 開始,每10秒鐘出現一次gc current request等待。直到2017-09-13 06:56:22,kill lock table后結束gc current request等待事件。
顯然 insert? /*+append*/? into F_NS_PD_BMC_DETAIL(跑批中主要步驟)沒有完成,是與gc current request 有著密不可分的關系。
解決gc current request問題就可以解決跑批的問題了。
C.這個gc current request等待事件與BUG是否有關系?
已經定位根本問題了,gc current request問題。
ETL process hangs when using an Oracle 11g RAC database (文檔 ID 1996100.1)
解決辦法:
Option 1
Set the "_gc_read_mostly_locking" parameter to FALSE using the following steps:
a. Stop STARETL Process.? (If running on linux, you can check the process by running "ps -ef | grep star".? If any process is found it can be terminated using the "kill" command)
b. Connect Oracle Database Hosting The STARUSER schema as the "SYS" User.
c. Backup the content of spfile to pfile and set _gc_read_mostly_locking to FALSE using the below commands:
SQL> create pfile='/oracle/product/base/oradb/admin/P6_test/adump/bakspfile.ora' from spfile;
SQL> alter system set "_gc_read_mostly_locking"=FALSE scope=spfile;
d. Restart Oracle Database hosting the STARUSER schema.
e.? Rerun STARETL.
enq: TX - row lock contention =>
SQL_TEXT
--------------------------------------------------------------------------------
DELETE FROM use_resource_state WHERE resource_addr=:1 AND resource_type=:2 AND r
經7天觀察,跑批流程一切正常再沒有出現進行不去的情況。已經解決了業務人員的基本要求。
DBA分析問題,不能已解決問題為標準,其實構架上應用上也是存在問題的。所以,繼續研究一下。
在分析過程中,還發現不少問題。
1.鎖爭用嚴重。RAC并行使用有問題。分庫分表應用構架有問題。同時的業務模塊(insert into),同時在2個節點進行插入操作。
2.logfile 文件爭用嚴重;批量insert,沒有分批造成過多的redo log和archive log 。
3.長時間的enq: TX - row lock contention 竟然是一個程序模塊引起的。寫死了,沒有提交動作程序也是奇葩。
DELETE FROM use_resource_state WHERE resource_addr=:1 AND resource_type=:2 AND ...? ?不 commit,也是一個奇葩的程序了。
########################################################################################
版權所有,文章允許轉載,但必須以鏈接方式注明源地址,否則追究法律責任!【QQ交流群:53993419】
QQ:14040928 E-mail:dbadoudou@163.com
本文鏈接: http://blog.itpub.net/26442936/viewspace-2145624/
########################################################################################
總結
以上是生活随笔為你收集整理的oracle gc current request,gc current request 引起长期锁表的故障的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 如何使用OneNote和draw.io来
- 下一篇: Hardening Windows, S