多维数据查询效率分析(1)
有時需求需要我們把系統做成靈活的。最常見的形式是,屬性不能是固定的,要用戶可以自定義。這樣的需求往往會在數據庫中建模成一個一對多的關系。
create table person {...
} create table person_attribute {
person_id ...
attribute_name ...
attribute_value ...
...
}
這樣的建模在沒有查詢需要的時候,還是蠻不錯的。但是一旦需要對擴充的屬性值進行查詢,速度往往慘不忍睹。曾經在新加坡做過一個電信的遺留系統的前端,其數據庫的建模就是這樣的。對于中間的屬性表,一個簡單的查詢都需要join好幾次,速度非常慢。好在那次只是做ETL,并不是直接把這樣的數據庫做后端,要不然肯定死的很慘。當時的做法是把所有的數據讀入到內存中,針對屬性的查詢用內存集合遍歷來實現。這樣做的前提是集合的元素數量非常少(幾百而已),總數據量也非常少。但是如果我們需要處理的數據量非常大,那么我們就必須在數據庫中能夠對多維數據進行高效查詢。
為了搞清楚這個問題,我們需要做一系列實驗。先來介紹一下我們實驗的對象。
假定我們有一張contacts表,然后對每個聯系人有一個contact_categories的表,簡稱cc
create table cc(contact_id integer,
cad_id integer,
value integer);
cad_id代表字段的id,value是cateogry的值。數據量是500萬。實驗用的數據庫分別是PostgreSQL(原始數據大小250M),和使用MYISAM引擎的MySQL(原始數據大小73M)。使用的磁盤是普通的筆記本硬盤,沒有raid,普通的ext4分區,峰值傳輸率大概是70M/s。假設沒有資源的爭搶,而且數據庫總是以最快的順序讀的方式從磁盤中加載數據,那么PostgreSQL得用4s,而MySQL也需要1s才能把所有的磁盤內容讀到內存中。
所以如果我們使用基于磁盤的解決方案的話,無論如何也無法把查詢壓縮到1s以內。因為我們需要提供一個Reponsive的界面前端,所以數據必須能夠在內存中被查詢,可能不是所有的數據都能放入內存,但是最起碼被查詢到的數據得一直在內存中。最簡單的辦法把數據庫移到內存中的方式不是改數據庫的設置,而是直接把內存映射成文件夾:
sudo mount -t ramfs -o size=200000m ramfs /mnt/memory然后把數據庫的數據目錄移動到/mnt/memory之中。但是數據庫的緩存設置也是必須修改的,比如PostgreSQL的work_mem如果設置過小的話,在做對一個很大的表做count(distinct xxx)時就會導致中間結果被寫入到臨時表之中。所以我們還是把所有的緩存搞大一些吧。
work_mem = 1000MBshared_buffer = 1000MB
temp_buffer = 1000MB
effective_cache_size = 1000MB
wal_buffers = 1000MB
auto_vacuum = off
讓我們來看看最基本的一個查詢能有多快吧
taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc;'count
---------
5904385
(1 row)
real 0m0.448s
user 0m0.024s
sys 0m0.000s
count(*)和count(contact_id)是一樣的么,讓我們來試驗一下
taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc;'count
---------
5904385
(1 row)
real 0m0.537s
user 0m0.016s
sys 0m0.008s
有意思!居然比count(*)還要慢。可能是因為我們沒有給contact_id字段加索引的原因。好吧,加上索引。
taowen@dmright-perf:~$ time psql postgres -c 'create index contact_id_idx on cc(contact_id);'CREATE INDEX
real 0m4.848s
user 0m0.016s
sys 0m0.008s
taowen@dmright-perf:~$ time psql postgres -c 'analyze cc;'
ANALYZE
real 0m0.197s
user 0m0.024s
sys 0m0.004s
再試試看
taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc;'count
---------
5904385
(1 row)
real 0m0.534s
user 0m0.020s
sys 0m0.008s
taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc;'
count
---------
5904385
(1 row)
real 0m0.447s
user 0m0.028s
sys 0m0.008s
基本上沒有變化……count(*)貌似就是比count(contact_id)要快。不管啦,既然小于500ms,也算是夠快了。讓我們給查詢加上個條件吧。
taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc where cad_id = 101 and value = 5;'count
--------
998839
(1 row)
real 0m0.686s
user 0m0.024s
sys 0m0.000s
taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc where cad_id = 101 and value = 5;'
count
--------
998839
(1 row)
real 0m0.660s
user 0m0.024s
sys 0m0.000s
我們可以看到,在count(*)和count(contact_id)之間沒有特別大的區別。但是我們還沒有給cad_id和value加索引,所以讓我們加上看看如何
taowen@dmright-perf:~$ time psql postgres -c 'create index cad_id_value_idx on cc(cad_id, value);'CREATE INDEX
real 0m10.069s
user 0m0.020s
sys 0m0.008s
taowen@dmright-perf:~$ time psql postgres -c 'analyze cc;'
ANALYZE
real 0m0.199s
user 0m0.016s
sys 0m0.012s
and try again.
taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc where cad_id = 101 and value = 5;'
count
--------
998839
(1 row)
real 0m0.283s
user 0m0.020s
sys 0m0.012s
真是很快很快耶!讓我們來回顧一下,我們現在對于contact_id和(cad_id, value)都建立了索引。
接下來我們把查詢弄得更復雜一些,加上AND條件。有三種可能的方式:
1、INTERSECT
2、INNER JOIN
3、IN + SUB QUERY
我們每種做法都試試
count
--------
164788
(1 row)
real 0m1.159s
user 0m0.028s
sys 0m0.008s
taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp' count
--------
164788
(1 row)
real 0m1.148s
user 0m0.032s
sys 0m0.000s
顯而易見,這么做很慢。那么INNER JOIN是不是更快一些呢?
taowen@dmright-perf:~$ time psql postgres -c 'select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'count
--------
164788
(1 row)
real 0m1.162s
user 0m0.036s
sys 0m0.000s
這大概要慢上個200ms了。那么IN + SUB QUERY呢?
taowen@dmright-perf:~$ time psql postgres -c 'select count(a1.contact_id) from cc as a1 where a1.cad_id = 101 and a1.value = 5 and a1.contact_id in (select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7)'count
--------
164788
(1 row)
real 0m2.645s
user 0m0.024s
sys 0m0.004s
靠,居然更慢了。總結就是,在有兩個條件的情況下,INTERSECT似乎是最快的。但是即便如此,它也超過了一秒鐘了。為什么會這樣呢?時間都花哪里去了?
taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp'QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=107853.91..107853.92 rows=1 width=4) (actual time=1471.907..1471.907 rows=1 loops=1)
-> Subquery Scan on temp (cost=7467.63..107375.65 rows=191301 width=4) (actual time=1366.131..1459.419 rows=164788 loops=1)
-> HashSetOp Intersect (cost=7467.63..105462.64 rows=191301 width=4) (actual time=1366.129..1439.781 rows=164788 loops=1)
-> Append (cost=7467.63..103326.69 rows=854380 width=4) (actual time=69.436..797.478 rows=1829408 loops=1)
-> Subquery Scan on "*SELECT* 2" (cost=7467.63..48180.53 rows=351876 width=4) (actual time=69.435..350.031 rows=830569 loops=1)
-> Bitmap Heap Scan on cc (cost=7467.63..44661.77 rows=351876 width=4) (actual time=69.434..264.538 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=64.162..64.162 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
-> Subquery Scan on "*SELECT* 1" (cost=10667.56..55146.16 rows=502504 width=4) (actual time=69.846..331.544 rows=998839 loops=1)
-> Bitmap Heap Scan on cc (cost=10667.56..50121.12 rows=502504 width=4) (actual time=69.845..233.794 rows=998839 loops=1)
Recheck Cond: ((cad_id = 101) AND (value = 5))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=64.501..64.501 rows=998839 loops=1)
Index Cond: ((cad_id = 101) AND (value = 5))
Total runtime: 1477.484 ms
(15 rows)
首先,INTERSECT不是并行執行的。兩個子查詢分別花費了300ms以上的時間,加起來有800ms是用在搜集contact_id上了。其次,集合之間的并集操作花費了600多ms。其余的時間都花在了數集合的成員個數上了。有一點值得注意的是,根據 http://postgresql.1045698.n5.nabble.com/ANTI-JOIN-needs-table-index-scan-not-possible-td3425340.html index scan和heap scan實際上都用上了索引。然而,SELECT還是在其之上額外耗費了100ms,我猜測它可能是回到原始的表結構中把行取出來,以獲得contact_id的值。這在磁盤上的話速度會更慢,因為會是random seek操作。
看完了INTERSECT,讓我們再來分析分析Join:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=180366.31..180366.32 rows=1 width=4) (actual time=1280.964..1280.964 rows=1 loops=1)
-> Merge Join (cost=174783.00..180104.02 rows=104912 width=4) (actual time=1043.879..1270.197 rows=164788 loops=1)
Merge Cond: (a1.contact_id = a2.contact_id)
-> Sort (cost=97705.18..98961.44 rows=502504 width=4) (actual time=586.735..626.292 rows=998839 loops=1)
Sort Key: a1.contact_id
Sort Method: quicksort Memory: 71397kB
-> Bitmap Heap Scan on cc a1 (cost=10667.56..50121.12 rows=502504 width=4) (actual time=75.377..257.403 rows=998839 loops=1)
Recheck Cond: ((cad_id = 101) AND (value = 5))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=69.565..69.565 rows=998839 loops=1)
Index Cond: ((cad_id = 101) AND (value = 5))
-> Sort (cost=77077.83..77957.52 rows=351876 width=4) (actual time=457.131..492.428 rows=830569 loops=1)
Sort Key: a2.contact_id
Sort Method: quicksort Memory: 63509kB
-> Bitmap Heap Scan on cc a2 (cost=7467.63..44661.77 rows=351876 width=4) (actual time=60.039..186.422 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=54.929..54.929 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
Total runtime: 1286.846 ms
(18 rows)
貌似大部分時間都花在了排序上。而且它也不是并行執行的。如果后臺同時執行top命令的話,就會發現只有一個核是被實際占用著的。
要是再創建更多的索引呢?會不會有幫助?
real 0m10.683s
user 0m0.020s
sys 0m0.004s
taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=180366.31..180366.32 rows=1 width=4) (actual time=1338.478..1338.478 rows=1 loops=1)
-> Merge Join (cost=174783.00..180104.02 rows=104912 width=4) (actual time=1090.136..1327.312 rows=164788 loops=1)
Merge Cond: (a1.contact_id = a2.contact_id)
-> Sort (cost=97705.18..98961.44 rows=502504 width=4) (actual time=615.843..657.438 rows=998839 loops=1)
Sort Key: a1.contact_id
Sort Method: quicksort Memory: 71397kB
-> Bitmap Heap Scan on cc a1 (cost=10667.56..50121.12 rows=502504 width=4) (actual time=80.926..275.431 rows=998839 loops=1)
Recheck Cond: ((cad_id = 101) AND (value = 5))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=75.816..75.816 rows=998839 loops=1)
Index Cond: ((cad_id = 101) AND (value = 5))
-> Sort (cost=77077.83..77957.52 rows=351876 width=4) (actual time=474.279..510.866 rows=830569 loops=1)
Sort Key: a2.contact_id
Sort Method: quicksort Memory: 63509kB
-> Bitmap Heap Scan on cc a2 (cost=7467.63..44661.77 rows=351876 width=4) (actual time=65.335..198.655 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=60.314..60.314 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
Total runtime: 1346.587 ms
(18 rows)
還是不行!Merge Join慢的話,升級到9.1然后強制使用hash join會不會好一點?
postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5';QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=902356.85..902356.86 rows=1 width=4) (actual time=1693.137..1693.137 rows=1 loops=1)
-> Hash Join (cost=59599.78..902135.88 rows=88389 width=4) (actual time=461.788..1682.718 rows=164788 loops=1)
Hash Cond: (a1.contact_id = a2.contact_id)
-> Bitmap Heap Scan on cc a1 (cost=10833.24..50406.32 rows=510472 width=4) (actual time=76.337..224.571 rows=998839 loops=1)
Recheck Cond: ((cad_id = 101) AND (value = 5))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10705.62 rows=510472 width=0) (actual time=71.028..71.028 rows=998839 loops=1)
Index Cond: ((cad_id = 101) AND (value = 5))
-> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=385.256..385.256 rows=830569 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 29200kB
-> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=64.778..258.059 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=59.675..59.675 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
Total runtime: 1698.207 ms
(14 rows)
有兩個原因造成這樣的情況:
1、兩個條件意味著掃描兩次,而且不是并行掃描
2、join自身很耗費時間,無論是sort merge join還是hash join
兩個條件都這熊樣了,三個條件呢?看看吧,先上INTERSECT:
postgres@dmright-perf:/mnt/memory/pg$ time psql postgres -c 'select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7 intersect select contact_id from cc where cad_id = 6 and value = 1) as temp'count
-------
6748
(1 row)
real 0m1.350s
user 0m0.020s
sys 0m0.008s
然后是INNER JOIN
postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id inner join cc as a3 on a1.contact_id = a3.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';count
-------
6748
(1 row)
real 0m0.756s
user 0m0.028s
sys 0m0.000s
然后是IN + SUBQUERY:
postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'select count(a1.contact_id) from cc as a1 where a1.cad_id = 101 and a1.value = 5 and a1.contact_id in (select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7 and a2.contact_id in (select contact_id from cc as a3 where a3.cad_id = 6 and a3.value = 1))';count
-------
6748
(1 row)
real 0m7.320s
user 0m0.024s
sys 0m0.004s
我們可以看到這回INNER JOIN是最快的了。為什么?
postgres@dmright-perf:/mnt/memory/pg$ time psql postgres -c 'explain analyze select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7 intersect select contact_id from cc where cad_id = 6 and value = 1) as temp'QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=132357.21..132357.22 rows=1 width=4) (actual time=1659.666..1659.666 rows=1 loops=1)
-> Subquery Scan on temp (cost=306.85..132335.71 rows=8602 width=4) (actual time=1648.818..1659.243 rows=6748 loops=1)
-> HashSetOp Intersect (cost=306.85..132249.69 rows=8602 width=4) (actual time=1648.817..1658.543 rows=6748 loops=1)
-> Append (cost=306.85..131691.89 rows=223118 width=4) (actual time=33.797..1547.811 rows=372016 loops=1)
-> Subquery Scan on "*SELECT* 3" (cost=306.85..26138.13 rows=14239 width=4) (actual time=33.797..153.230 rows=207228 loops=1)
-> Bitmap Heap Scan on cc (cost=306.85..25995.74 rows=14239 width=4) (actual time=33.795..134.292 rows=207228 loops=1)
Recheck Cond: ((cad_id = 6) AND (value = 1))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=28.558..28.558 rows=207228 loops=1)
Index Cond: ((cad_id = 6) AND (value = 1))
-> Result (cost=7341.27..105553.76 rows=208879 width=4) (actual time=1282.426..1371.255 rows=164788 loops=1)
-> HashSetOp Intersect (cost=7341.27..105553.76 rows=208879 width=4) (actual time=1282.418..1353.422 rows=164788 loops=1)
-> Append (cost=7341.27..103413.10 rows=856264 width=4) (actual time=57.550..734.282 rows=1829408 loops=1)
-> Subquery Scan on "*SELECT* 2" (cost=7341.27..47902.07 rows=345792 width=4) (actual time=57.550..292.595 rows=830569 loops=1)
-> Bitmap Heap Scan on cc (cost=7341.27..44444.15 rows=345792 width=4) (actual time=57.549..218.162 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=52.606..52.606 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
-> Subquery Scan on "*SELECT* 1" (cost=10833.24..55511.04 rows=510472 width=4) (actual time=69.129..330.737 rows=998839 loops=1)
-> Bitmap Heap Scan on cc (cost=10833.24..50406.32 rows=510472 width=4) (actual time=69.128..242.416 rows=998839 loops=1)
Recheck Cond: ((cad_id = 101) AND (value = 5))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10705.62 rows=510472 width=0) (actual time=64.161..64.161 rows=998839 loops=1)
Index Cond: ((cad_id = 101) AND (value = 5))
Total runtime: 1665.691 ms
(23 rows)
對于INTERSECT來說,過程和兩個條件是差不多的,只是集合更大一些罷了。
postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id inner join cc as a3 on a1.contact_id = a3.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=118979.47..118979.48 rows=1 width=4) (actual time=771.393..771.393 rows=1 loops=1)
-> Nested Loop (cost=49073.39..118977.89 rows=630 width=4) (actual time=399.835..770.587 rows=6748 loops=1)
-> Hash Join (cost=49073.39..97159.97 rows=2466 width=8) (actual time=399.814..691.519 rows=34219 loops=1)
Hash Cond: (a3.contact_id = a2.contact_id)
-> Bitmap Heap Scan on cc a3 (cost=306.85..25995.74 rows=14239 width=4) (actual time=25.655..83.526 rows=207228 loops=1)
Recheck Cond: ((cad_id = 6) AND (value = 1))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=20.570..20.570 rows=207228 loops=1)
Index Cond: ((cad_id = 6) AND (value = 1))
-> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=373.969..373.969 rows=830569 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 29200kB
-> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=59.271..250.932 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=54.030..54.030 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
-> Index Scan using cad_id_value_contact_id_idx on cc a1 (cost=0.00..8.83 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=34219)
Index Cond: ((cad_id = 101) AND (value = 5) AND (contact_id = a2.contact_id))
Total runtime: 774.588 ms
(17 rows)
對于INNER JOIN,索引都被利用上了。它不需要取得contact_id然后再來做集合操作。結論是對rowid做hash操作比集合操作更快。
不顯示用INNER JOIN,讓Planner決定Join順序也是一樣的:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=118979.47..118979.48 rows=1 width=4) (actual time=762.969..762.970 rows=1 loops=1)
-> Nested Loop (cost=49073.39..118977.89 rows=630 width=4) (actual time=398.554..762.206 rows=6748 loops=1)
-> Hash Join (cost=49073.39..97159.97 rows=2466 width=8) (actual time=398.531..684.425 rows=34219 loops=1)
Hash Cond: (a3.contact_id = a2.contact_id)
-> Bitmap Heap Scan on cc a3 (cost=306.85..25995.74 rows=14239 width=4) (actual time=34.802..91.672 rows=207228 loops=1)
Recheck Cond: ((cad_id = 6) AND (value = 1))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=29.524..29.524 rows=207228 loops=1)
Index Cond: ((cad_id = 6) AND (value = 1))
-> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=363.537..363.537 rows=830569 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 29200kB
-> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=57.799..245.467 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=52.349..52.349 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
-> Index Scan using cad_id_value_contact_id_idx on cc a1 (cost=0.00..8.83 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=34219)
Index Cond: ((cad_id = 101) AND (value = 5) AND (contact_id = a2.contact_id))
Total runtime: 766.107 ms
(17 rows)
結果是一樣的。
基本上對PostgreSQL的實驗就到這里了,差不多也就這樣了,提高空間不大。Google之后發現,PostgreSQL使用的MVCC機制導致其甚至在做SELECT COUNT這樣的操作的時候也會去更新hint bit。也許這就是其慢的重要原因。MySQL的MYISAM引擎以完全不負責事務和著稱,應該會比PostgreSQL有更大的提升,下一篇中,我們將對MySQL重復同樣的實驗。
?
轉載于:https://www.cnblogs.com/taowen/archive/2012/02/26/2367724.html
總結
以上是生活随笔為你收集整理的多维数据查询效率分析(1)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 2012-02-25工作记录
- 下一篇: 【转】VS2008制作打包程序将安装路径