Oracle:集群因子
什么是集群因子(Clustering Factor)呢?集群因子是通過一個索引掃描一張表時需要訪問的表的數據塊的數量。
集群因子的計算方法如下:
(1) 掃描一個索引;
(2) 比較某行的ROWID和前一行的ROWID,如果這兩個ROWID不屬于同一個數據塊,那么集群因子增加1;
(3) 整個索引掃描完畢后,就得到了該索引的集群因子。
以上面集群因子的計算方式可以看出,集群因子反映了索引范圍掃描可能帶來的對整個表訪問過程的IO開銷情況,如果集群因子接近于表存儲的塊數,說明這張表是按照索引字段的順序存儲的。如果集群因子接近于行的數量,那說明這張表不是按索引字段順序存儲的。在計算索引訪問成本時,集群因子十分有用。集群因子乘以選擇性參數就是訪問索引的開銷。(是優化器決定是否用索引的關鍵因素)
官方說明:
?----row存儲的越有序,clustering factor的值越低
----當clustering factor很高時,說明index entry (rowid) 是????隨機指向一些block的,在一個大的index range scan時,這樣為了讀取這些rowid指向的block,就需要一次又一次重復的去讀這些block。
? ? ? ?----當clustering factor值低時,說明index keys (rowid) 是指向的記錄是存儲在相同的block里,這樣去讀row時,只需要在同一個block里讀取就可以了,這樣減少重復讀取blocks的次數
Index Clustering Factor說明
??? 簡單的說,Index Clustering Factor是通過一個索引掃描一張表,需要訪問的表的數據塊的數量,即對I/O的影響,也代表索引鍵存儲位置是否有序。
????(1)、如果越有序,即相鄰的鍵值存儲在相同的block,那么這時候Clustering Factor的值就越低;
??? (2)、如果不是很有序,即鍵值是隨機的存儲在block上,這樣在讀取鍵值時,可能就需要一次又一次的去訪問相同的block,從而增加了I/O。
??? Clustering Factor的計算方式如下:
???? (1)、掃描一個索引(large index range scan);
???? (2)、比較某行的rowid和前一行的rowid,如果這兩個rowid不屬于同一個數據塊,那么cluster factor增加1;
???? (3)、整個索引掃描完畢后,就得到了該索引的clustering factor。
??????????? 如果clustering factor接近于表存儲的塊數,說明這張表是按照索引字段的順序存儲的。
??????????? 如果clustering factor接近于行的數量,那說明這張表不是按索引字段順序存儲的。
??????????? 在計算索引訪問成本的時候,這個值十分有用。Clustering Factor乘以選擇性參數(selectivity)就是訪問索引的開銷。
??????????? 如果這個統計數據不能真實反映出索引的真實情況,那么可能會造成優化器錯誤的選擇執行計劃。另外如果某張表上的大多數訪問是按照某個索引做索引掃描,那么將該表的數據按照索引字段的順序重新組織,可以提高該表的訪問性能。
測試
????產生問題
----查看一下數據庫的版本---- SQL>?select?*?from?v$version?where?rownum=1;BANNER -------------------------------------------------------------------------------- Oracle?Database?11g?Enterprise?Edition?Release?11.2.0.1.0?-?Production----創建一張測試表jack---- SQL>?create?table?jack?as?select?*?from?dba_objects?where?1=2;Table?created.----將數據無序的插入jack表中---- SQL>?begin2??????for?i?in?1..10?loop3????????insert?/*+?append?*/?into?jack?select?*?from?dba_objects?order?by?i;4??????commit;5????end?loop;6??end;7??/PL/SQL?procedure?successfully?completed.SQL>?select?count(*)?from?jack;COUNT(*) ----------725460----查看一下表的大小----- SQL>?set?wrap?off SQL>?col?owner?for?a10; SQL>?col?segment_name?for?a15; SQL>?select?segment_name,blocks,extents,bytes/1024/1024||'M'?"size"?from?user_segments?where?segment_name='JACK';SEGMENT_NAME?????BLOCKS????EXTENTS???size -------------?----------?----------?-------- JACK?????????????11264???????82??????88M----在object_id上創建索引---- SQL>?create?index?jack_ind?on?jack(object_id);Index?created.----查看一下索引的大小---- SQL>?select?segment_name,segment_type,blocks,extents,bytes/1024/1024||'M'?"size"?from?user_segments?where?segment_name='JACK_IND';SEGMENT_NAME????SEGMENT_TYPE??????BLOCKS?????EXTENTS?????size ------------?------------------?----------?----------?--------- JACK_IND???????????INDEX???????????1664?????????28????????13M----在沒有收集相關的統計信息之前,查看一下index?clustering?factor---- SQL>?select?index_name,clustering_factor,num_rows?from?user_indexes?where?index_name='JACK_IND';INDEX_NAME??????CLUSTERING_FACTOR???NUM_ROWS ---------------?-----------------?---------- JACK_IND??????????????725460?????????725460----簡單的收集一下統計信息---- SQL>?exec?dbms_stats.gather_table_stats(user,'jack',cascade=>true);PL/SQL?procedure?successfully?completed.----再次查看index?clustering?factor---- SQL>?select?index_name,clustering_factor,num_rows?from?user_indexes?where?index_name='JACK_IND';INDEX_NAME?????CLUSTERING_FACTOR???NUM_ROWS --------------?-----------------?---------- JACK_IND??????????????725460???????725460???????----顯然統計信息收集前和后,clustering?factor值不變,說在創建索引的時候,會收集表中的數據真正的行數。并且這里的clustering?factor等num_rows,也說明表的clustering?factor是無序的。----查看一個確定值,然后查看執行計劃---- SQL>?explain?plan?for?select?*?from?jack?where?object_id=1501;Explained.SQL>?select?*?from?table(dbms_xplan.display);PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan?hash?value:?2860868395-------------------------------------------------------------------------------- |?Id??|?Operation????????????|?Name?????|?Rows??|?Bytes?|?Cost?(%CPU)|?Ti -------------------------------------------------------------------------------- |???0?|?SELECT?STATEMENT????????|???????????|????10?|???970?|????13?????(0)|?00 |???1?|??TABLE?ACCESS?BY?INDEX?ROWID|?JACK?????|????10?|???970?|????13?????(0)|?00 |*??2?|???INDEX?RANGE?SCAN????????|?JACK_IND?|????10?|???????|?????3?????(0)|?00 --------------------------------------------------------------------------------Predicate?Information?(identified?by?operation?id):PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------2?-?access("OBJECT_ID"=1501)14?rows?selected.?????????????????----在這里走了索引,cost為13.SQL>?alter?system?flush?buffer_cache;System?altered.SQL>?set?autotrace?traceonly;----查詢一個范圍的執行計劃---- SQL>?select?*?from?jack?where?object_id>1000?and?object_id<2000;9880?rows?selected.Execution?Plan ---------------------------------------------------------- Plan?hash?value:?949574992-------------------------------------------------------------------------- |?Id??|?Operation??????|?Name?|?Rows??|?Bytes?|?Cost?(%CPU)|?Time?????| -------------------------------------------------------------------------- |???0?|?SELECT?STATEMENT??|?????|??9657?|???914K|??1824???(1)|?00:00:22?| |*??1?|??TABLE?ACCESS?FULL|?JACK?|??9657?|???914K|??1824???(1)|?00:00:22?| --------------------------------------------------------------------------Predicate?Information?(identified?by?operation?id): ---------------------------------------------------1?-?filter("OBJECT_ID"<2000?AND?"OBJECT_ID">1000)Statistics ----------------------------------------------------------0???recursive?calls0???db?block?gets10993?consistent?gets10340?physical?reads0???redo?size471945?bytes?sent?via?SQL*Net?to?client7657??bytes?received?via?SQL*Net?from?client660???SQL*Net?roundtrips?to/from?client0???sorts?(memory)0???sorts?(disk)9880??rows?processed???????----注意,object_id上是有索引的,但是這里并沒有使用索引,而是使用了全表掃描。SQL>?alter?system?flush?buffer_cache;System?altered.----強制走索引,查看執行計劃---- SQL>?select?/*+?index(jack?jack_ind)?*/?*?from?jack?where?object_id>1000?and?object_id<2000;9880?rows?selected.Execution?Plan ---------------------------------------------------------- Plan?hash?value:?2860868395---------------------------------------------------------------------------------------- |?Id??|?Operation????????????|?Name?????|?Rows??|?Bytes?|?Cost?(%CPU)|?Time?????| ---------------------------------------------------------------------------------------- |???0?|?SELECT?STATEMENT????????|???????????|??9657?|???914K|??9683?????(1)|?00:01:57?| |???1?|??TABLE?ACCESS?BY?INDEX?ROWID|?JACK?????|??9657?|???914K|??9683?????(1)|?00:01:57?| |*??2?|???INDEX?RANGE?SCAN????????|?JACK_IND?|??9657?|???????|????24?????(0)|?00:00:01?| ----------------------------------------------------------------------------------------Predicate?Information?(identified?by?operation?id): ---------------------------------------------------2?-?access("OBJECT_ID">1000?AND?"OBJECT_ID"<2000)Statistics ----------------------------------------------------------0????recursive?calls0????db?block?gets10561??consistent?gets164????physical?reads0????redo?size988947??bytes?sent?via?SQL*Net?to?client7657???bytes?received?via?SQL*Net?from?client660????SQL*Net?roundtrips?to/from?client0????sorts?(memory)0????sorts?(disk)9880???rows?processed???? ----強制走索引之后,使用了index?range?scan,但是cost變成了9683,而全表掃描時是1824. ----還有比較一下兩次查詢中物理讀的情況:全表掃描的物理讀明顯比索引的要高很多,但是Oracle卻沒有使用索引。 ----因此Oracle認為走索引的Cost比走全表掃描大,而是大N倍,CBO是基于Cost來決定執行計劃的。 ----由此得出,對于索引的Cost,Oracle是根據clustering?factor參數來計算的,而該實驗中的clustering?factor參數是很高的,數據存儲無序。這就造成了Oracle認為走索引的cost比全表掃描的大。解決問題:
| ----通過上面的分析,可以看出,要降低clustering?factor才能解決問題,而要解決clustering?factor,就需要重新對表的存儲位置進行排序。--------重建jakc表---- SQL>?create?table?echo?as?select?*?from?jack?where?1=0;Table?created.SQL>?insert?/*+?append?*/?into?echo?select?*?from?jack?order?by?object_id;725460?rows?created.SQL>?commit;Commit?complete.SQL>?truncate?table?jack;Table?truncated.SQL>?insert?/*+?append?*/?into?jack?select?*?from?echo;725460?rows?created.SQL>?commit;Commit?complete.----查看表和索引的信息---- SQL>?select?segment_name,blocks,extents,bytes/1024/1024||'M'?"size"?from?user_segments?where?segment_name='JACK';SEGMENT_NAME????BLOCKS????EXTENTS?????size -------------?----------?----------?----------- JACK?????????????11264???????82????????88MSQL>?select?segment_name,segment_type,blocks,extents,bytes/1024/1024||'M'?"size"?from?user_segments?where?segment_name='JACK_IND';SEGMENT_NAME????SEGMENT_TYPE??????BLOCKS?????EXTENTS????size ------------?------------------?----------?----------?------------- JACK_IND????????????INDEX??????????1536??????????27?????12MSQL>?select?index_name,clustering_factor,num_rows?from?user_indexes?where?index_name='JACK_IND';INDEX_NAME?????CLUSTERING_FACTOR?NUM_ROWS -------------?-----------------?---------- JACK_IND?????????????725460??????725460----對索引進行rebuild---- SQL>?alter?index?jack_ind?rebuild;Index?altered.----查看cluster?factor---- SQL>?select?index_name,clustering_factor,num_rows?from?user_indexes?where?index_name='JACK_IND';INDEX_NAME??????CLUSTERING_FACTOR??NUM_ROWS ---------------?-----------------?---------- JACK_IND???????????????10327???????725460????------注意這里的Factor,已經變成10327,我們收集一下表的統計信息,然后與表的block進行一次比較。SQL>?exec?dbms_stats.gather_table_stats(user,'jack',cascade=>true);PL/SQL?procedure?successfully?completed.SQL>?select?blocks?from?dba_tables?where?table_name='JACK';BLOCKS ----------10474???----表jack實際使用的block是10474,clustering?factor是10327基本還是比較接近了,這也說明相鄰的row是存儲在相同的block里。SQL>?select?index_name,clustering_factor,num_rows?from?user_indexes?where?index_name='JACK_IND';INDEX_NAME???????????????CLUSTERING_FACTOR???NUM_ROWS ------------------------------?-----------------?---------- JACK_IND???????????????????10327?????725460SQL>?alter?system?flush?buffer_cache;System?altered.SQL>?set?autotrace?traceonly;----再次查看之前sql的執行計劃---- SQL>?select?*?from?jack?where?object_id>1000?and?object_id<2000;9880?rows?selected.Execution?Plan ---------------------------------------------------------- Plan?hash?value:?2860868395---------------------------------------------------------------------------------------- |?Id??|?Operation????????????|?Name?????|?Rows??|?Bytes?|?Cost?(%CPU)|?Time?????| ---------------------------------------------------------------------------------------- |???0?|?SELECT?STATEMENT????????|???????????|??9657?|???914K|???162?????(0)|?00:00:02?| |???1?|??TABLE?ACCESS?BY?INDEX?ROWID|?JACK?????|??9657?|???914K|???162?????(0)|?00:00:02?| |*??2?|???INDEX?RANGE?SCAN????????|?JACK_IND?|??9657?|???????|????24?????(0)|?00:00:01?| ----------------------------------------------------------------------------------------Predicate?Information?(identified?by?operation?id): ---------------------------------------------------2?-?access("OBJECT_ID">1000?AND?"OBJECT_ID"<2000)Statistics ----------------------------------------------------------1???recursive?calls0???db?block?gets1457??consistent?gets151???physical?reads0???redo?size988947??bytes?sent?via?SQL*Net?to?client7657??bytes?received?via?SQL*Net?from?client660???SQL*Net?roundtrips?to/from?client0???sorts?(memory)0???sorts?(disk)9880??rows?processed----注意這里的cost已經降到了162,性能提升還是非常明顯。 |
小結
????通過以上說明和測試,可以看到clustering factor也是索引健康的一個重要判斷的標準。其值越低越好。它會影響CBO選擇正確的執行計劃。但是注意一點,clustering factor總是趨勢與不斷惡化的。
轉載于:https://blog.51cto.com/newcourage/1942656
總結
以上是生活随笔為你收集整理的Oracle:集群因子的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 吉利银河官宣发布:将推7款新品 L7二季
- 下一篇: Derby的jar说明