listagg 函数--oracle 11g release 2
http://xpchild.blog.163.com/blog/static/10180985920108485721969/
listagg 函數--oracle 11g release 2??
2010-09-04 21:45:10|??分類:默認分類 |??標簽:|字號大中小?訂閱
????本文描述了在oracle 11g release 2 版本中新增的listagg函數,listagg是一個實現字符串聚合的oracle內建函數。作為一種普遍的技術,網絡上也有多種實現字符串聚合的方法。本文會首先介紹listagg函數,最后會拿這些方法與listagg進行性能方面的對比。
樣例數據
??? 本文的例子將使用如下的樣例數據:
?DEPTNO ENAME????? HIREDATE
---------- ---------- ----------
??????? 10 CLARK?????? 09/06/1981
??????? 10 KING????????17/11/1981
??????? 10 MILLER????? 23/01/1982
??????? 20 ADAMS?????? 12/01/1983
??????? 20 FORD??????? 03/12/1981
??????? 20 JONES?????? 02/04/1981
??????? 20 SCOTT?????? 09/12/1982
??????? 20 SMITH?????? 17/12/1980
??????? 30 ALLEN?????? 20/02/1981
??????? 30 BLAKE?????? 01/05/1981
??????? 30 JAMES?????? 03/12/1981
??????? 30 MARTIN????? 28/09/1981
??????? 30 TURNER??????08/09/1981
??????? 30 WARD??????? 22/02/1981
字符串聚合
????字符串聚合就是按照分組把多行數據串聯成一行,以下面的結果集為例:
DEPTNO ENAME
--------- ----------
?????? 10 CLARK
?????? 10 KING
?????? 10 MILLER
?????? 20 ADAMS
?????? 20 FORD
?????? 20 JONES
???? 按照DEPTNO字段分組,對結果集進行字符串聚合,結果如下:
DEPTNO AGGREGATED_ENAMES
--------- -------------------------
?????? 10 CLARK,KING,MILLER
?????? 20 ADAMS,FORD,JONES
???? 可以看到,employee names基于deptno分組實現了串聯,如前所述,有很多種方法實現聚合功能(文章最后提供相關鏈接),但是listagg更為簡單,易用。
listagg 語法概述
?? listagg函數的語法結構如下:
LISTAGG( [,]) WITHIN GROUP (ORDER BY ) [OVER (PARTITION BY )]
??? listagg雖然是聚合函數,但可以提供分析功能(比如可選的OVER()子句)。使用listagg中,下列中的元素是必須的:
- 需要聚合的列或者表達式
- WITH GROUP 關鍵詞
- 分組中的ORDER BY子句
?? 下面將演示listagg函數使用的例子
listagg 作為聚合函數
?? 下面以EMP表為例,按照部門分組聚合employee name,并以,為分隔符。
SQL> SELECT deptno 2 , LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees 3 FROM emp 4 GROUP BY 5 deptno; DEPTNO EMPLOYEES ---------- ------------------------------------------------------------ 10 CLARK,KING,MILLER 20 ADAMS,FORD,JONES,SCOTT,SMITH 30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD 3 rows selected.????注:在每個聚合元素中,本例選用empolyee name字段進行排序,不過需要說明的是,在其它實現字符串聚合方法中,排序可是重量級的任務。
??? 下面的例子中,empolyee name的聚合將按照hire date來排序。
SQL> SELECT deptno
? 2? ,????? LISTAGG(ename, ',') WITHIN GROUP (ORDER BY hiredate) AS employees
? 3? FROM?? emp
? 4? GROUP? BY
? 5???????? deptno;
與許多的聚合函數類似,listagg通過加上over()子句可以實現分析功能,下面的例子將展示分析功能: SQL> SELECT deptno 2 , ename 3 , hiredate 4 , LISTAGG(ename, ',') 5 WITHIN GROUP (ORDER BY hiredate) 6 OVER (PARTITION BY deptno) AS employees 7 FROM emp; DEPTNO ENAME HIREDATE EMPLOYEES ---------- ---------- ----------- ------------------------------------- 10 CLARK 09/06/1981 CLARK,KING,MILLER 10 KING 17/11/1981 CLARK,KING,MILLER 10 MILLER 23/01/1982 CLARK,KING,MILLER 20 SMITH 17/12/1980 SMITH,JONES,FORD,SCOTT,ADAMS 20 JONES 02/04/1981 SMITH,JONES,FORD,SCOTT,ADAMS 20 FORD 03/12/1981 SMITH,JONES,FORD,SCOTT,ADAMS 20 SCOTT 19/04/1987 SMITH,JONES,FORD,SCOTT,ADAMS 20 ADAMS 23/05/1987 SMITH,JONES,FORD,SCOTT,ADAMS 30 ALLEN 20/02/1981 ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES 30 WARD 22/02/1981 ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES 30 BLAKE 01/05/1981 ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES 30 TURNER 08/09/1981 ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES 30 MARTIN 28/09/1981 ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES 30 JAMES 03/12/1981 ALLEN,WARD,BLAKE,TURNER,MARTIN,JAMES 14 rows selected. 切記:分析函數不會丟失結果集的每一行,而字符串的聚合卻并非如此。排序
如前所述,ORDER BY 子句是必選項,如下例所示:
SQL> SELECT deptno 2 , LISTAGG(ename, ',') WITHIN GROUP () AS employees 3 FROM emp 4 GROUP BY 5 deptno; , LISTAGG(ename, ',') WITHIN GROUP () AS employees * ERROR at line 2: ORA-30491: missing ORDER BY clause?? 如果所要聚合字段的排序無關緊要,那么可以可以使用NULL代替:
SQL> SELECT deptno 2 , LISTAGG(ename, ',') WITHIN GROUP (ORDER BY NULL) AS employees 3 FROM emp 4 GROUP BY 5 deptno; DEPTNO EMPLOYEES ---------- ------------------------------------------------------------ 10 CLARK,KING,MILLER 20 ADAMS,FORD,JONES,SCOTT,SMITH 30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD 3 rows selected.??? 在這個例子當中,雖然使用的是NULL來進行排序,但結果集中聚合的元素還是按字母的順序排序的,這是因為listagg的默認排序行為。
分隔符
?????? 在字符串的聚合中,可以使用靜態變量或者表達式作為分隔符,事實上,分隔符是可選項,例如:
SQL> SELECT deptno 2 , LISTAGG(ename) WITHIN GROUP (ORDER BY ename) AS employees 3 FROM emp 4 GROUP BY 5 deptno; DEPTNO EMPLOYEES ---------- ------------------------------------------------------------ 10 CLARKKINGMILLER 20 ADAMSFORDJONESSCOTTSMITH 30 ALLENBLAKEJAMESMARTINTURNERWARD 3 rows selected.??????? 唯一的限制是,分隔符要么是靜態變量(如字母),要么是建立在分組字段上的確定性表達式,比如,不能使用ROWNUM作為分隔符,如下所示:
SQL> SELECT deptno 2 , LISTAGG(ename, '(' || ROWNUM || ')') 3 WITHIN GROUP (ORDER BY hiredate) AS employees 4 FROM emp 5 GROUP BY 6 deptno; , LISTAGG(ename, '(' || ROWNUM || ')') * ERROR at line 2: ORA-30497: Argument should be a constant or a function of expressions in GROUP BY. 錯誤信息非常清楚:ROWNUM既不是靜態變量,也不是建立在分組字段上的表達式,如果使用了分組字段,那就限制了表達式的類型,例如: SQL> SELECT deptno 2 , LISTAGG(ename, '(' || MAX(deptno) || ')') 3 WITHIN GROUP (ORDER BY hiredate) AS employees 4 FROM emp 5 GROUP BY 6 deptno; , LISTAGG(ename, '(' || MAX(deptno) || ')') * ERROR at line 2: ORA-30496: Argument should be a constant. 這個例子當中,oracle分析到分隔符試圖使用分組字段,但是是一個非法的表達式,下面的例子中,使用了oracle接受的確定性表達式: SQL> SELECT deptno 2 , LISTAGG(ename, '(' || CHR(deptno+55) || '); ') 3 WITHIN GROUP (ORDER BY hiredate) AS employees 4 FROM emp 5 GROUP BY 6 deptno; DEPTNO EMPLOYEES ---------- ------------------------------------------------------------ 10 CLARK(A); KING(A); MILLER 20 SMITH(K); JONES(K); FORD(K); SCOTT(K); ADAMS 30 ALLEN(U); WARD(U); BLAKE(U); TURNER(U); MARTIN(U); JAMES 3 rows selected. 這里把DETPNO轉化成ASCII字符作為分隔符,這個一個在分組列上的確定性表達式。其它限制listagg聚合的結果列大小限制在varchar2類型的最大值內(比如4000),例如: ? SQL> SELECT LISTAGG(object_name) WITHIN GROUP (ORDER BY NULL) 2 FROM all_objects; FROM all_objects * ERROR at line 2: ORA-01489: result of string concatenation is too long 這里沒有clob或者更大的varchar2類型類代替,所以更大的字符串必須使用替代方案(比如COLLECTION或者用戶自定義的PL/SQL函數)
性能方面
??????? 下面將比較幾種常用的字符串聚合方法的性能,類比的有:
- LISTAGG (11g Release 2);
- COLLECT + PL/SQL function(10g);
- Oracle Data Cartridge - user-defined aggregate function (9i)
- MODEL SQL (10g).
????? 這里最主要的不同是listagg是一個內建函數,所以其至少與其它方案有可比性。
建立環境
?????? 為了性能比較,下面將建立一張有2000個分組,100萬行數據的表,具體如下:
SQL> CREATE TABLE t 2 AS 3 SELECT ROWNUM AS id 4 , MOD(ROWNUM,2000) AS grp 5 , DBMS_RANDOM.STRING('u',5) AS val 6 , DBMS_RANDOM.STRING('u',30) AS pad 7 FROM dual 8 CONNECT BY ROWNUM <= 1000000; Table created. SQL> SELECT COUNT(*) FROM t; COUNT(*) ---------- 1000000 1 row selected. SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER, 'T'); PL/SQL procedure successfully completed.?????? 這里將使用wall-clock和autotrace進行性能方面的類比。注:樣例的數據已被緩存,準備環境如下:
SQL> set autotrace traceonly statistics SQL> set timing on SQL> set arrays 500listagg
?????????第一個測試的是listagg,下面將對2000個分組進行聚合,并按照value排序:
?
????? 測試機上,這條語句執行了不到6秒,沒有磁盤物理I/O,所有的sorting都在內存當中。
stragg/wm_concat
?
???????下面將使用廣為流傳的字符串聚合,Tom Kyte的定義的聚合函數STRAGG。在 oracle的10g版本中,oracle在WMSYS的用戶下實現了類似功能的函數,這里直接使用這個函數來測試。注:STRAGG函數不支持字符串的排序。
SQL> SELECT grp 2 , WMSYS.WM_CONCAT(val) AS vals --<-- WM_CONCAT ~= STRAGG 3 FROM t 4 GROUP BY 5 grp; 2000 rows selected. Elapsed: 00:00:19.45 Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 7206 consistent gets 0 physical reads 0 redo size 6039067 bytes sent via SQL*Net to client 552 bytes received via SQL*Net from client 5 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 2000 rows processed???? 這個方法花費了三倍于listagg的時間(沒有排序),用戶自定義的函數會比這個PL/SQL函數效率更低(比如:上下文切換)
collect(without ordering)
?????當10g發布的時候,我就立即使用collect函數和一個“collection-to-string”PL/SQL函數來替代STRAGG。不過10g版本中的collect沒有排序功能。注;To_STRING的源碼可以在相關文檔中查到。
SQL> SELECT grp 2 , TO_STRING( 3 CAST(COLLECT(val) AS varchar2_ntt) 4 ) AS vals 5 FROM t 6 GROUP BY 7 grp; 2000 rows selected. Elapsed: 00:00:02.90 Statistics ---------------------------------------------------------- 10 recursive calls 0 db block gets 7197 consistent gets 0 physical reads 0 redo size 6039067 bytes sent via SQL*Net to client 552 bytes received via SQL*Net from client 5 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 2000 rows processed 沒有排序的情況下,collect/TO_STRING方法比listagg快了兩倍,但是listagg花費了大量的時間在排序上,如果說排序時無關緊要的,那么可以說collect技術是最快的。 collect (with ordering)公平起見,在collect中引入排序(11g中的一個新特性),如下; SQL> SELECT grp 2 , TO_STRING( 3 CAST(COLLECT(val ORDER BY val) AS varchar2_ntt) 4 ) AS vals 5 FROM t 6 GROUP BY 7 grp; 2000 rows selected. Elapsed: 00:00:07.08 Statistics ---------------------------------------------------------- 10 recursive calls 0 db block gets 7197 consistent gets 0 physical reads 0 redo size 6039067 bytes sent via SQL*Net to client 552 bytes received via SQL*Net from client 5 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 2000 rows processed 這次,引入了排序后,collect方法確實比listagg慢多了。model
?????最后一個性能比較是與使用了model子句的實現方法。下面的例子的源代碼來自于Rob van Wijk's About Oracle blog 并做了些修改以適應樣例數據。
SQL> SELECT grp 2 , vals 3 FROM ( 4 SELECT grp 5 , RTRIM(vals, ',') AS vals 6 , rn 7 FROM t 8 MODEL 9 PARTITION BY (grp) 10 DIMENSION BY (ROW_NUMBER() OVER (PARTITION BY grp ORDER BY val) AS rn) 11 MEASURES (CAST(val AS VARCHAR2(4000)) AS vals) 12 RULES 13 ( vals[ANY] ORDER BY rn DESC = vals[CV()] || ',' || vals[CV()+1] 14 ) 15 ) 16 WHERE rn = 1 17 ORDER BY 18 grp; 2000 rows selected. Elapsed: 00:03:28.15 Statistics ---------------------------------------------------------- 3991 recursive calls 0 db block gets 7092 consistent gets 494791 physical reads 0 redo size 6039067 bytes sent via SQL*Net to client 553 bytes received via SQL*Net from client 5 SQL*Net roundtrips to/from client 130 sorts (memory) 0 sorts (disk) 2000 rows processed?????? 這個例子執行了三分鐘,統計信息顯示發生了大量的I/O讀,遞歸調用和內存排序,事實上,這個糟糕的表現主要是由于在查詢中,大量的對臨時表空間的讀和寫(盡管統計信息并未顯示磁盤排序)。
?????? MODEL字符串聚合方法的執行計劃如下:
-------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| -------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | | 1 | SORT ORDER BY | | 1000K| 1934M| 1953M| |* 2 | VIEW | | 1000K| 1934M| | | 3 | SQL MODEL ORDERED | | 1000K| 9765K| | | 4 | WINDOW SORT | | 1000K| 9765K| 19M| | 5 | TABLE ACCESS FULL| T | 1000K| 9765K| | -------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("RN"=1) 通過SQL的監控報告(使用DBMS_SQLTUNE.REPORT_SQL_MONITOR)在SQL MODEL操作的第三步中,數據的排序使用4Gb的臨時空間,在Gary Myers' Sydney Oracle Lab blog中也闡述了這個現象。 性能總結從以上事例中可以看出,listagg函數是字符串聚合方法中效率最高的一個,并且還是一個內建的函數。如果不需要排序的情況下,collect會更快一些,但如果是需要排序的話,listagg絕對是最快的。 深度
?
總結
以上是生活随笔為你收集整理的listagg 函数--oracle 11g release 2的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: JAVA的内省机制(introspect
- 下一篇: Spring使用注释