oracle library cache lock,【DB】彻底搞清楚library cache lock的成因和解决方法(一)
問題描述:
接到應用人員的報告,說是在任何對表CSNOZ629926699966的操作都會hang,包括desc CSNOZ629926699966,例如:
> sqlplus
SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 10:11:06 2005
Copyright (c) 1982, 2002, Oracle Corporation.? All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production
SQL> conn pubuser/pubuser
Connected.
SQL> desc CSNOZ629926699966
。。。
這個進程 hang 了
。。。
詢問了一下之前有無特別的操作,業務人員說很久以前執行了腳本,但是該教本運行很久都沒有結果,然后他就退出了會話,再之后,就出現了上面的情況。腳本內容如下:$ cat CSNOZ629926699966.sh
#!/bin/sh
sqlplus??<< EOF? #use your username/password
create table CSNOZ629926699966 as select * from CSNOZ62992266cs
where mid not in ( select mid from??where servid='020999011964' and status in ('A','B','S'));
exit;
$
$
$
$
解決過程:?> sqlplus "/ as sysdba"
SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 10:19:13 2005
Copyright (c) 1982, 2002, Oracle Corporation.? All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production
SQL> select * from v$lock where block=1;
no rows selected
SQL> /
no rows selected
SQL> /
no rows selected
SQL>
我們看到目前沒有鎖的信息
SQL> select xidusn, object_id, session_id, locked_mode from v$locked_object;
。。。
XIDUSN? OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
14???????? 18???????? 37?????????? 3
。。。
SQL> /
。。。
XIDUSN? OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
14???????? 18???????? 37?????????? 3
。。。
SQL> /
。。。
XIDUSN? OBJECT_ID SESSION_ID LOCKED_MODE
---------- ---------- ---------- -----------
14???????? 18???????? 37?????????? 3
。。。
SQL>
查找 v$locked_object,我們發現了一個可疑的會話,SID 37:
SQL> select object_name,owner,object_type from dba_objects where object_id=18;
。。。 。。。
OBJECT_NAME??????????????????? OWNER????????????????????????? OBJECT_TYPE
------------------------------ ------------------------------ ------------------
OBJ$?????????????????????????? SYS??????????????????????????? TABLE
。。。 。。。
SQL>
奇怪怎么一直有這個鎖??
初步猜測是由于SID為37的會話執行了上面的DDL語句,并在語句未完成前異常退出,
造成了所有訪問那個(DDL語句中涉及到的)對象的進程都hang了。
接下來我們看看等待事件:
SQL> select event,sid,p1,p2,p3 from v$session_wait where event not like 'SQL*%' and event not like 'rdbms%';
EVENT??????????????????????????????????????????????????????????????????? P1???????? P2??????? SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer????????????????????????????????????????????????????????????? 300????????? 0????????? 1
ges remote message?????????????????????????????????????????????????????? 32????????? 0????????? 4
gcs remote message?????????????????????????????????????????????????????? 64????????? 0????????? 5
gcs remote message?????????????????????????????????????????????????????? 64????????? 0????????? 7
smon timer????????????????????????????????????????????????????????????? 300????????? 0???????? 19
library cache lock?????????????????????????????????????????????? 1.3835E+19 1.3835E+19???????? 30wakeup time manager?????????????????????????????????????????????????????? 0????????? 0???????? 22
7 rows selected.
SQL> /
EVENT??????????????????????????????????????????????????????????????????? P1???????? P2??????? SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer????????????????????????????????????????????????????????????? 300????????? 0????????? 1
ges remote message?????????????????????????????????????????????????????? 32????????? 0????????? 4
gcs remote message?????????????????????????????????????????????????????? 64????????? 0????????? 5
gcs remote message?????????????????????????????????????????????????????? 64????????? 0????????? 7
smon timer????????????????????????????????????????????????????????????? 300????????? 0???????? 19
library cache lock?????????????????????????????????????????????? 1.3835E+19 1.3835E+19???????? 30wakeup time manager?????????????????????????????????????????????????????? 0????????? 0???????? 22
7 rows selected.
SQL> /
EVENT??????????????????????????????????????????????????????????????????? P1???????? P2??????? SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer????????????????????????????????????????????????????????????? 300????????? 0????????? 1
ges remote message?????????????????????????????????????????????????????? 32????????? 0????????? 4
gcs remote message?????????????????????????????????????????????????????? 64????????? 0????????? 5
gcs remote message?????????????????????????????????????????????????????? 64????????? 0????????? 7
smon timer????????????????????????????????????????????????????????????? 300????????? 0???????? 19
library cache lock?????????????????????????????????????????????? 1.3835E+19 1.3835E+19???????? 30wakeup time manager?????????????????????????????????????????????????????? 0????????? 0???????? 22
7 rows selected.
SQL> /
EVENT??????????????????????????????????????????????????????????????????? P1???????? P2??????? SID
---------------------------------------------------------------- ---------- ---------- ----------
pmon timer????????????????????????????????????????????????????????????? 300????????? 0????????? 1
ges remote message?????????????????????????????????????????????????????? 32????????? 0????????? 4
gcs remote message?????????????????????????????????????????????????????? 64????????? 0????????? 5
gcs remote message?????????????????????????????????????????????????????? 64????????? 0????????? 7
smon timer????????????????????????????????????????????????????????????? 300????????? 0???????? 19
library cache lock?????????????????????????????????????????????? 1.3835E+19 1.3835E+19???????? 30wakeup time manager?????????????????????????????????????????????????????? 0????????? 0???????? 22
7 rows selected.
SQL>
我們注意到下面的事件:
EVENT??????????????????????????????????????????????????????????????????? P1???????? P2??????? SID
---------------------------------------------------------------- ---------- ---------- ----------
。。。
library cache lock?????????????????????????????????????????????? 1.3835E+19 1.3835E+19???????? 30
。。。
P1 是句柄地址(handle address),也就是'library cache lock'發生的地址。
P2 是一個狀態對象,在這里,它表示在對象上加載的鎖的地址(lock address)。
P1 和 P2都是科學計數發表示的10進制數。
這些信息再次證實了上面的猜測,SID 37阻塞了SID 30。
找出這兩個可疑進程的sid和serial,然后對他們設置10046事件:SQL> select sid,serial# from v$session where sid in (30,37);
SID??? SERIAL#
---------- ----------
30????? 24167
37?????? 2707
SQL> exec dbms_system.set_ev(30,24167,10046,12,'');
PL/SQL procedure successfully completed.
SQL> exec dbms_system.set_ev(37,2707,10046,12,'');
PL/SQL procedure successfully completed.
SQL>
跟蹤期間咱們再次測試一下,看看有沒有其他線索。
新開一個進程,找出其sid, serial和spid等信息:?> sqlplus pubuser/pubuser
SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 11:36:25 2005
Copyright (c) 1982, 2002, Oracle Corporation.? All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production
SQL> select distinct sid from v$mystat;
SID
----------
33
SQL> select sid,serial# from v$session where sid=33;
SID??? SERIAL#
---------- ----------
33?????? 6639
SQL> SELECT SPID,PID FROM V$PROCESS WHERE ADDR=(SELECT PADDR FROM V$SESSION WHERE SID=37);
SPID??????????????? PID
------------ ----------
20552??????????????? 26
SQL> SELECT SPID,PID FROM V$PROCESS WHERE ADDR=(SELECT PADDR FROM V$SESSION WHERE SID=30);
SPID??????????????? PID
------------ ----------
22580??????????????? 28
SQL> show parameter dump
NAME???????????????????????????????? TYPE??????? VALUE
------------------------------------ ----------- ------------------------------
background_core_dump???????????????? string????? partial
background_dump_dest???????????????? string????? /ora9i/app/oracle/admin/csmisc
/bdump
core_dump_dest?????????????????????? string????? /ora9i/app/oracle/admin/csmisc
/cdump
max_dump_file_size?????????????????? string????? UNLIMITED
shadow_core_dump???????????????????? string????? partial
user_dump_dest?????????????????????? string????? /ora9i/app/oracle/admin/csmisc
/udump
SQL>
然后,再嘗試對 CSNOZ629926699966 表進行操作
SQL> desc CSNOZ629926699966
。。。
還是hang住了。
于是中斷這個操作(CTRL + C):
SQL> desc CSNOZ629926699966
ERROR:
ORA-01013: user requested cancel of current operation
SQL> select tname from tab where tname='CSNOZ629926699966';
no rows selected
SQL>查看PUBUSER用戶下的這個表,居然不存在!!
進一步證實了前面的猜測,也就是說會話37阻塞了其他所有操作表CSNOZ629926699966的會話,造成了進程的hang,當然,包括上面的SID 30和SID 33的DDL語句
現在,我們結束10046的事件跟蹤:SQL> exec dbms_system.set_ev(30,24167,0,0,'');
PL/SQL procedure successfully completed.
SQL> exec dbms_system.set_ev(37,2707,0,0,'');
PL/SQL procedure successfully completed.
SQL>
根據上面記錄的信息,我們知道這兩個會話產生的跟蹤信息分別為:
SID為30的會話,產生的跟蹤文件為:/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_22580.trc
SID為37的會話,產生的跟蹤文件為:/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_20552.trc
看看trace文件:
> cd /ora9i/app/oracle/admin/csmisc/udump
> ll -tlc
total 4432
-rw-r-----?? 1 ora9i????? dba???????? 332995 Jan 10 12:00 csmisc2_ora_22580.trc
-rw-r-----?? 1 ora9i????? dba?????????? 3168 Jan 10 11:59 csmisc2_ora_20552.trc-rw-r-----?? 1 ora9i????? dba???????? 407133 Jan? 7 15:10 csmisc2_ora_2708.trc
-rw-r-----?? 1 ora9i????? dba??????????? 640 Jan? 7 14:48 csmisc2_ora_835.trc
-rw-r-----?? 1 ora9i????? dba?????????? 1590 Dec 30 22:50 csmisc2_ora_16244.trc
-rw-r-----?? 1 ora9i????? dba??????? 1308403 Dec 30 22:44 csmisc2_ora_16033.trc
-rw-r-----?? 1 ora9i????? dba??????????? 616 Dec 28 14:16 csmisc2_ora_2176.trc
-rw-r-----?? 1 ora9i????? dba??????????? 644 Dec? 8 18:22 csmisc2_ora_21083.trc
> mailx -s "csmisc2_ora_22580.trc"??< csmisc2_ora_22580.trc
> mailx -s "csmisc2_ora_20552.trc"??< csmisc2_ora_20552.trc
> exit
SQL>
我們看到SID為30的會話,產生的跟蹤文件(csmisc2_ora_22580.trc)為的主要內容是:/ora9i/app/oracle/admin/csmisc/udump/csmisc2_ora_22580.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /ora9i/app/oracle/product/920
System name: HP-UX
Node name: cs_dc02
Release: B.11.11
Version: U
Machine: 9000/800
Instance name: csmisc2Redo thread mounted by this instance: 2
Oracle process number: 28Unix process pid: 22580, image:??(TNS V1-V3)
*** 2005-01-10 11:31:49.416
***?SESSION ID:(30.24167)?2005-01-10 11:31:49.354
WAIT #0: nam='library cache lock'?ela= 507258 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock'?ela= 505686 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507678 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507595 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507880 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507317 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507703 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507683 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 508265 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507100 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507684 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 505889 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507731 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507650 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507604 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
WAIT #0: nam='library cache lock' ela= 507698 p1=-4611686013547141416 p2=-4611686013691716064 p3=1301
。。。 。。。
我們看到SID 30的跟蹤文件中的等待事件就是在V$SESSION_WAIT中看到的'library cache lock' .
再看看SID為37的會話,產生的跟蹤文件(csmisc2_ora_20552.trc)為的主要內容是:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production
ORACLE_HOME = /ora9i/app/oracle/product/920
System name: HP-UX
Node name: cs_dc02Release: B.11.11
Version: U
Machine: 9000/800
Instance name: csmisc2Redo thread mounted by this instance: 2
Oracle process number: 26
Unix process pid: 20552, image:??(TNS V1-V3)
*** 2005-01-10 11:33:22.702
*** SESSION ID:(37.2707) 2005-01-10 11:33:22.690
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:35:07.452
WAIT #1: nam='SQL*Net message from dblink' ela= 102293555 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 3 p1=675562835 p2=1 p3=0*** 2005-01-10 11:36:55.980
WAIT #1: nam='SQL*Net message from dblink' ela= 105969709 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:39:05.416
WAIT #1: nam='SQL*Net message from dblink' ela= 126390826 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:41:12.878
WAIT #1: nam='SQL*Net message from dblink' ela= 124461520 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:43:01.285
WAIT #1: nam='SQL*Net message from dblink' ela= 105859385 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
*** 2005-01-10 11:44:48.200
WAIT #1: nam='SQL*Net message from dblink' ela= 104397696 p1=675562835 p2=1 p3=0
WAIT #1: nam='SQL*Net message to dblink' ela= 4 p1=675562835 p2=1 p3=0
。。。 。。。
現在我們來dump 系統狀態(systemstate),看看更詳細的信息。
首先簡單的介紹一下 event systemstate。很多人把 systemstate 事件理解為dump發生的那一刻的系統內所有進程的信息,這是個錯誤的概念,事實上,
轉儲 system state 產生的跟蹤文件是從dump那一刻開始到dump任務完成之間一段事件內的系統內所有進程的信息。
dump systemstate產生的跟蹤文件包含了系統中所有進程的進程狀態等信息。每個進程對應跟蹤文件中的一段內容,反映該進程的狀態信息,包括進程信息,會話信息,enqueues信息(主要是lock的信息),緩沖區的信息和該進程在SGA區中持有的(held)對象的狀態等信息。
那么通常在什么情況下使用systemstate比較合適呢??Oracle推薦的使用systemstate事件的幾種情況是:
數據庫 hang 住了
數據庫很慢
進程正在hang
數據庫出現某些錯誤
資源爭用
dump systemstate的語法為:ALTER SESSION SET EVENTS 'immediate trace name systemstate level 10';
也可以使用ORADEBUG實現這個功能
ORADEBUG DUMP SYSTEMSTATE level 10
如果希望在發生某種錯誤時除非systemstate事件,可以在參數文件(spfile或者pfile)中設置event參數,
例如,當系統發生死鎖(出現ORA-00060錯誤)時dump systemstate:
event = "60 trace name systemstate level 10"
言歸正傳,我們dump系統狀態:SQL> ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 8';
Session altered.
SQL> host
>cd /ora9i/app/oracle/admin/csmisc/udump
> ll -ctl
-rw-r-----?? 1 ora9i????? dba??????? 1070863 Jan 10 13:02 csmisc2_ora_22580.trc
-rw-r-----?? 1 ora9i????? dba??????? 1345368 Jan 10 13:01 csmisc2_ora_22568.trc-rwxrwxrwx?? 1 ora9i????? dba????????? 44114 Jan 10 12:52 ass1015.awk
-rw-r-----?? 1 ora9i????? dba???????? 407133 Jan? 7 15:10 csmisc2_ora_2708.trc
-rw-r-----?? 1 ora9i????? dba??????????? 640 Jan? 7 14:48 csmisc2_ora_835.trc
-rw-r-----?? 1 ora9i????? dba?????????? 1590 Dec 30 22:50 csmisc2_ora_16244.trc
-rw-r-----?? 1 ora9i????? dba??????? 1308403 Dec 30 22:44 csmisc2_ora_16033.trc
-rw-r-----?? 1 ora9i????? dba??????????? 616 Dec 28 14:16 csmisc2_ora_2176.trc
-rw-r-----?? 1 ora9i????? dba??????????? 644 Dec? 8 18:22 csmisc2_ora_21083.trc
>
> mailx -s "22568"??< csmisc2_ora_22568.trc
這個跟蹤文件很大(因為它包含了所有進程的信息),那么我們從哪里開始看起呢?
首先,通過在跟蹤文件中查找字符串"waiting for 'library cache lock'",我們找到了被阻塞進程的信息:
PROCESS 28:?----------------被阻塞的Oracle進程,這里PROCESS 28對應了V$PROCESS中的PID的值,
也就是說我們可以根據這一信息在V$PROCESS和V$SESSION找到被阻塞的會話的信息? ----------------------------------------
SO: c000000109c83bf0, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=28,?calls cur/top: c00000010b277890/c00000010b277890, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 17 24 6
last post received-location: ksusig
last process to post me: c000000109c840f8 25 0
last post sent: 0 0 15
last post sent-location: ksasnd
last process posted by me: c000000109c7ff90 1 6
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: c000000109eefda0
O/S info: user: ora9i, term: pts/th, ospid: 22580? ----------------該進程的操作系統進程號,對應于V$PROCESS中的SPID
OSD pid info: Unix process pid: 22580, image:??(TNS V1-V3)
----------------------------------------
SO: c000000109f02c68, type: 4, owner: c000000109c83bf0, flag: INIT/-/-/0x00
(session) trans: 0000000000000000, creator: c000000109c83bf0, flag: (100041) USR/- BSY/-/-/-/-/-
DID: 0002-001C-00000192, short-term DID: 0000-0000-00000000
txn branch: 0000000000000000
oct: 0, prv: 0, sql: c00000011f8ea068, psql: c00000011f8ea068, user: 50/PUBUSER
O/S info: user: ora9i, term: , ospid: 22536, machine: cs_dc02
program:(TNS V1-V3)
application name: SQL*Plus, hashvalue=3669949024waiting for 'library cache lock' blockingsess=0x0?seq=18589 wait_time=0
handle address=c000000122e2a6d8, lock address=c00000011a449e20, 100*mode+namespace=515
。。。 。。。
SO: c00000010b277890, type: 3, owner: c000000109c83bf0, flag: INIT/-/-/0x00
(call) sess: cur c000000109f02c68, rec 0, usr c000000109f02c68; depth: 0
----------------------------------------
SO: c00000011a449e20, type: 51, owner: c00000010b277890, flag: INIT/-/-/0x00
LIBRARY OBJECT LOCK: lock=c00000011a449e20?handle=c000000122e2a6d8?request=S
call pin=0000000000000000 session pin=0000000000000000
htl=c00000011a449e90[c00000011a4bc350,c00000011a4bc350] htb=c00000011a4bc350
user=c000000109f02c68 session=c000000109f02c68?count=0 flags=[00] savepoint=463
the rest of the object was already dumped
。。。 。。。
請注意下面的信息:????waiting for 'library cache lock' blocking sess=0x0?seq=18589 wait_time=0
handle address=c000000122e2a6d8, lock address=c00000011a449e20, 100*mode+namespace=515
這段信息告訴我們ORACLE PID為 28的進程(PROCESS 28),正在等待'library cache lock' ,通過‘handle address=c000000122e2a6d8’我們可以找到阻塞它的會話的ORACLE PID信息。
還要注意這段信息:????? LIBRARY OBJECT LOCK: lock=c00000011a449e20?handle=c000000122e2a6d8?request=S
call pin=0000000000000000 session pin=0000000000000000
htl=c00000011a449e90[c00000011a4bc350,c00000011a4bc350] htb=c00000011a4bc350
user=c000000109f02c68 session=c000000109f02c68?count=0 flags=[00] savepoint=463
這里就是阻塞PROCESS 28進程的會話的信息。
簡單的記住這個依據的要點是:
waiting session的'handle address'的值對應于blocking session的'handle'的值。
回過頭來,看看這個值,它應于上面我們在V$SESSION_WAIT中看到的P1和P2的值:
SQL> select to_number('C000000122E2A6D8','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX') from dual;
TO_NUMBER('C000000122E2A6D8','XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX')
----------------------------------------------------------------
1.3835E+19
SQL>
問題的成因已經基本上明確了,這里推薦兩種解決問題的方法:
方法1,根據 c000000122e2a6d8 地址,我們可以得到當前在library cache中相應的鎖信息:SQL> l
1? select INST_ID,USER_NAME,KGLNAOBJ,KGLLKSNM,KGLLKUSE,KGLLKSES,KGLLKMOD,KGLLKREQ,KGLLKPNS,KGLLKHDL
2* from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ
SQL> /
INST_ID USER_NAME???? KGLNAOBJ?????????????????KGLLKSNMKGLLKUSE?????????KGLLKSES?????KGLLKMOD??KGLLKREQ?KGLLKPNS???????? KGLLKHDL
---------- ------------- ---------------------- ---------- ---------------- ---------------- ---------- ---------- ---------------- ----------------
2 PUBUSER?????? CSNOZ629926699966??????????????30 C000000109F02C68 C000000109F02C680????????? 200?????????????? C000000122E2A6D8
2 PUBUSER?????? CSNOZ629926699966??????????????37 C000000108C99E28 C000000108C99E283????????? 000?????????????? C000000122E2A6D8
SQL>
按照Oracle推薦的做法,我們現在應該使用'alter system kill session'命令kill掉SID 37,結果得到了ORA-00031錯誤:
SQL> alter system kill session '37,2707';
alter system kill session '37,2707'
*
ERROR at line 1:
ORA-00031: session marked for kill
SQL>
檢查SID 37的狀態:
SQL> set linesize 150
SQL> col program for a50
SQL> select sid,serial#,status,username,program from v$session where sid=37;
SID??? SERIAL# STATUS?? USERNAME?????????????????????? PROGRAM
---------- ---------- -------- ------------------------------ --------------------------------------------------
37?????? 2707?KILLED?? PUBUSER?????????????????????????(TNS V1-V3)
SQL>
再次證實了我們最初的想法—— 有人在執行了某個需要運行很久的DDL(多數是語句效率低,當然不排除遭遇bug的可能),
然后沒等語句結束就異常退出了會話。
這個例子中我們在上面的跟蹤文件已經找到了該會話對應的操作系統進程(SPID),如果在其他情況下,我們如何找到這種狀態為'KILLED'
的操作系統進程號(SPID)呢?
下面給出了一個方法,可以借鑒:
SQL> l
1? SELECT s.username,s.status,
2? x.ADDR,x.KSLLAPSC,x.KSLLAPSN,x.KSLLASPO,x.KSLLID1R,x.KSLLRTYP,
3? decode(bitand (x.ksuprflg,2),0,null,1)
4? FROM x$ksupr x,v$session s
5? WHERE s.paddr(+)=x.addr
6? and bitand(ksspaflg,1)!=0
7* and s.sid=37
SQL> /
USERNAME?????????????????????? STATUS?? ADDR?????????????? KSLLAPSC?? KSLLAPSN KSLLASPO?????? KSLLID1R KS D
------------------------------ -------- ---------------- ---------- ---------- ------------ ---------- -- -
PUBUSER????????????????????????KILLED?? C000000109C831E0???????? 41???????? 15 16243??????????????? 17
SQL>
x$ksupr.ADDR列的值對應了V$PROCESS 中的ADDR的值,知道了這個SPID的地址,找到這個操作系統進程(SPID)就簡單了,例如:
SQL>?select spid,pid from v$process where addr='C000000109C831E0';
SPID??????????????? PID
------------ ----------
20552??????????????? 26
SQL>
現在,我們只需要在操作系統上 kill 操作系統進程20552就可以了:
> ps -ef | grep 20552
ora9i 20552???? 1? 0? Jan? 8? ????????? 0:01 oraclecsmisc2 (LOCAL=NO)
ora9i 14742 14740? 0 17:19:02 pts/ti??? 0:00 grep 20552
> kill -9 20552
> ps -ef | grep 20552
ora9i 14966 14964? 0 17:40:01 pts/ti??? 0:00 grep 20552
>
再來檢查一下SID 37的信息,我們看到這個會話是真的被kill掉了,
> exit
SQL> select sid,serial#,status,username,program from v$session where sid=37;
no rows selected
SQL> l
1? SELECT s.username,s.status,
2? x.ADDR,x.KSLLAPSC,x.KSLLAPSN,x.KSLLASPO,x.KSLLID1R,x.KSLLRTYP,
3? decode(bitand (x.ksuprflg,2),0,null,1)
4? FROM x$ksupr x,v$session s
5? WHERE s.paddr(+)=x.addr
6? and bitand(ksspaflg,1)!=0
7* and s.sid=37
SQL> /
no rows selected
SQL>
回到剛才hang住的會話,它已經恢復了正常操作,
并且我們已經得到了'ORA-04043: object CSNOZ629926699966 does not exist'這個正常的信息:
SQL> desc CSNOZ629926699966
ERROR:
ORA-04043: object CSNOZ629926699966 does not exist
SQL>
在開一個會話,測試一把:
> sqlplus pubuser/pubuser
SQL*Plus: Release 9.2.0.4.0 - Production on Mon Jan 10 17:42:16 2005
Copyright (c) 1982, 2002, Oracle Corporation.? All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - 64bit Production
With the Partitioning and Real Application Clusters options
JServer Release 9.2.0.4.0 - Production
SQL> set timing on
SQL> desc CSNOZ629926699966
ERROR:
ORA-04043: object CSNOZ629926699966 does not exist
SQL>
當發出命令'desc CSNOZ629926699966'的時候,我們看到系統立刻返回了ORA-04043: object CSNOZ629926699966 does not exist'信息,問題就此解決了。
這里,簡單的介紹一下X$KGLLK,這個基表保存了庫緩存中對象的鎖的信息,它對于解決這類問題特別有用,其名稱的含義如下:
[K]ernel Layer
[G]eneric Layer
[L]ibrary Cache Manager? ( defined and mapped from kqlf )
Object Locks
X$KGLLK - Object [L]oc[K]s
KGLNAOBJ 列包含了在librarky cache中的對象上執行命令的語句的前80個字符(其實從這里我們也可以大大縮小范圍了)
X$KGLLK.KGLLKUSE 和 x$kgllk.KGLLKSES 對應于跟蹤文件中的owner的值
X$KGLLK.KGLLKADR
X$KGLLK.KGLLKHDL 對應于跟蹤文件中的handle的值(handle=C000000122E2A6D8),也就是'library cache lock'的地址
X$KGLLK.KGLLKPNS 對應于跟蹤文件中的session pin的值
X$KGLLK.KGLLKSPN對應于跟蹤文件中的savepoint的值
我們再來看一下更全面的信息:
SQL> set linesize 2000
SQL> select * from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ
2? /
ADDR?????????????????? INDX??? INST_ID KGLLKADR???????? KGLLKUSE???????? KGLLKSES?????????? KGLLKSNM KGLLKHDL???????? KGLLKPNC???????? KGLLKPNS?????? KGLLKCNT?? KGLLKMOD?? KGLLKREQ?? KGLLKFLG?? KGLLKSPN KGLLKHTB?????????? KGLNAHSH KGLHDPAR?????????? KGLHDNSP USER_NAME????????????????? KGLNAOBJ
---------------- ---------- ---------- ---------------- ---------------- ---------------- ---------- ---------------- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- ---------------- ---------- ---------------- ---------- ------------------------------ ------------------------------------------------------------
800003FB0007E4D0???????? 33????????? 2 C00000011A449E20 C000000109F02C68 C000000109F02C68???????? 30 C000000122E2A6D8 00?????????????? 00??????????????????? 0????????? 0????????? 2????????? 0??????? 463 C00000011A4BC350 3990848181 C000000122E2A6D8????????? 1 PUBUSER??????????????????????????? CSNOZ629926699966
800003FB0007E5B0???????? 34????????? 2 C00000011A44A150 C000000108C99E28 C000000108C99E28???????? 37 C000000122E2A6D8 00?????????????? 00??????????????????? 1????????? 3????????? 0????????? 0??????? 179 C00000011A4BB328 3990848181 C000000122E2A6D8????????? 1 PUBUSER??????????????????????????? CSNOZ629926699966
SQL> set linesize 100
SQL> l
1* select * from X$KGLLK where KGLLKHDL = 'C000000122E2A6D8' order by KGLLKSNM,KGLNAOBJ
SQL> /
ADDR?????????????????? INDX??? INST_ID KGLLKADR???????? KGLLKUSE???????? KGLLKSES?????????? KGLLKSNM
---------------- ---------- ---------- ---------------- ---------------- ---------------- ----------
KGLLKHDL???????? KGLLKPNC???????? KGLLKPNS?????????? KGLLKCNT?? KGLLKMOD?? KGLLKREQ?? KGLLKFLG
---------------- ---------------- ---------------- ---------- ---------- ---------- ----------
KGLLKSPN KGLLKHTB?????????? KGLNAHSH KGLHDPAR?????????? KGLHDNSP USER_NAME
---------- ---------------- ---------- ---------------- ---------- ------------------------------
KGLNAOBJ
------------------------------------------------------------
800003FB0007E4D0???????? 33????????? 2 C00000011A449E20 C000000109F02C68 C000000109F02C68???????? 30
C000000122E2A6D8 00?????????????? 00??????????????????????? 0????????? 0????????? 2????????? 0
463 C00000011A4BC350 3990848181 C000000122E2A6D8????????? 1 PUBUSER
CSNOZ629926699966
800003FB0007E5B0???????? 34????????? 2 C00000011A44A150 C000000108C99E28 C000000108C99E28???????? 37
C000000122E2A6D8 00?????????????? 00??????????????????????? 1????????? 3????????? 0????????? 0
179 C00000011A4BB328 3990848181 C000000122E2A6D8????????? 1 PUBUSER
CSNOZ629926699966
SQL>
總結
以上是生活随笔為你收集整理的oracle library cache lock,【DB】彻底搞清楚library cache lock的成因和解决方法(一)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: oracle package lock,
- 下一篇: php win2003 下载,64位wi