當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

多维数据查询效率分析（1）

發布時間：2025/7/14 编程问答 20 豆豆

生活随笔收集整理的這篇文章主要介紹了多维数据查询效率分析（1）小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

有時需求需要我們把系統做成靈活的。最常見的形式是，屬性不能是固定的，要用戶可以自定義。這樣的需求往往會在數據庫中建模成一個一對多的關系。

create table person {

...　　

} create table person_attribute {

person_id ...

attribute_name ...

attribute_value ...

...

}

這樣的建模在沒有查詢需要的時候，還是蠻不錯的。但是一旦需要對擴充的屬性值進行查詢，速度往往慘不忍睹。曾經在新加坡做過一個電信的遺留系統的前端，其數據庫的建模就是這樣的。對于中間的屬性表，一個簡單的查詢都需要join好幾次，速度非常慢。好在那次只是做ETL，并不是直接把這樣的數據庫做后端，要不然肯定死的很慘。當時的做法是把所有的數據讀入到內存中，針對屬性的查詢用內存集合遍歷來實現。這樣做的前提是集合的元素數量非常少（幾百而已），總數據量也非常少。但是如果我們需要處理的數據量非常大，那么我們就必須在數據庫中能夠對多維數據進行高效查詢。

為了搞清楚這個問題，我們需要做一系列實驗。先來介紹一下我們實驗的對象。

假定我們有一張contacts表，然后對每個聯系人有一個contact_categories的表，簡稱cc

create table cc(
contact_id integer,
cad_id integer,
value integer);

cad_id代表字段的id，value是cateogry的值。數據量是500萬。實驗用的數據庫分別是PostgreSQL（原始數據大小250M），和使用MYISAM引擎的MySQL（原始數據大小73M）。使用的磁盤是普通的筆記本硬盤，沒有raid，普通的ext4分區，峰值傳輸率大概是70M/s。假設沒有資源的爭搶，而且數據庫總是以最快的順序讀的方式從磁盤中加載數據，那么PostgreSQL得用4s，而MySQL也需要1s才能把所有的磁盤內容讀到內存中。

所以如果我們使用基于磁盤的解決方案的話，無論如何也無法把查詢壓縮到1s以內。因為我們需要提供一個Reponsive的界面前端，所以數據必須能夠在內存中被查詢，可能不是所有的數據都能放入內存，但是最起碼被查詢到的數據得一直在內存中。最簡單的辦法把數據庫移到內存中的方式不是改數據庫的設置，而是直接把內存映射成文件夾：

sudo mount -t ramfs -o size=200000m ramfs /mnt/memory

然后把數據庫的數據目錄移動到/mnt/memory之中。但是數據庫的緩存設置也是必須修改的，比如PostgreSQL的work_mem如果設置過小的話，在做對一個很大的表做count(distinct xxx)時就會導致中間結果被寫入到臨時表之中。所以我們還是把所有的緩存搞大一些吧。

work_mem = 1000MB
shared_buffer = 1000MB
temp_buffer = 1000MB
effective_cache_size = 1000MB
wal_buffers = 1000MB
auto_vacuum = off

讓我們來看看最基本的一個查詢能有多快吧

taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc;'
count
---------
5904385
(1 row)

real 0m0.448s
user 0m0.024s
sys 0m0.000s

count(*)和count(contact_id)是一樣的么，讓我們來試驗一下

taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc;'
count
---------
5904385
(1 row)

real 0m0.537s
user 0m0.016s
sys 0m0.008s

有意思！居然比count(*)還要慢。可能是因為我們沒有給contact_id字段加索引的原因。好吧，加上索引。

taowen@dmright-perf:~$ time psql postgres -c 'create index contact_id_idx on cc(contact_id);'
CREATE INDEX

real 0m4.848s
user 0m0.016s
sys 0m0.008s
taowen@dmright-perf:~$ time psql postgres -c 'analyze cc;'
ANALYZE

real 0m0.197s
user 0m0.024s
sys 0m0.004s

再試試看

taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc;'
count
---------
5904385
(1 row)

real 0m0.534s
user 0m0.020s
sys 0m0.008s

taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc;'
count
---------
5904385
(1 row)

real 0m0.447s
user 0m0.028s
sys 0m0.008s

基本上沒有變化……count(*)貌似就是比count(contact_id)要快。不管啦，既然小于500ms，也算是夠快了。讓我們給查詢加上個條件吧。

taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc where cad_id = 101 and value = 5;'
count
--------
998839
(1 row)

real 0m0.686s
user 0m0.024s
sys 0m0.000s

taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc where cad_id = 101 and value = 5;'
count
--------
998839
(1 row)

real 0m0.660s
user 0m0.024s
sys 0m0.000s

我們可以看到，在count(*)和count(contact_id)之間沒有特別大的區別。但是我們還沒有給cad_id和value加索引，所以讓我們加上看看如何

taowen@dmright-perf:~$ time psql postgres -c 'create index cad_id_value_idx on cc(cad_id, value);'
CREATE INDEX

real 0m10.069s
user 0m0.020s
sys 0m0.008s

taowen@dmright-perf:~$ time psql postgres -c 'analyze cc;'
ANALYZE

real 0m0.199s
user 0m0.016s
sys 0m0.012s

and try again.

taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc where cad_id = 101 and value = 5;'
count
--------
998839
(1 row)

real 0m0.283s
user 0m0.020s
sys 0m0.012s

真是很快很快耶！讓我們來回顧一下，我們現在對于contact_id和(cad_id, value)都建立了索引。
接下來我們把查詢弄得更復雜一些，加上AND條件。有三種可能的方式：
1、INTERSECT
2、INNER JOIN
3、IN + SUB QUERY
我們每種做法都試試

taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp'
count
--------
164788
(1 row)

real 0m1.159s
user 0m0.028s
sys 0m0.008s

taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp' count
--------
164788
(1 row)

real 0m1.148s
user 0m0.032s
sys 0m0.000s

顯而易見，這么做很慢。那么INNER JOIN是不是更快一些呢？

taowen@dmright-perf:~$ time psql postgres -c 'select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'
count
--------
164788
(1 row)

real 0m1.162s
user 0m0.036s
sys 0m0.000s

這大概要慢上個200ms了。那么IN + SUB QUERY呢？

taowen@dmright-perf:~$ time psql postgres -c 'select count(a1.contact_id) from cc as a1 where a1.cad_id = 101 and a1.value = 5 and a1.contact_id in (select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7)'
count
--------
164788
(1 row)

real 0m2.645s
user 0m0.024s
sys 0m0.004s

靠，居然更慢了。總結就是，在有兩個條件的情況下，INTERSECT似乎是最快的。但是即便如此，它也超過了一秒鐘了。為什么會這樣呢？時間都花哪里去了？

taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=107853.91..107853.92 rows=1 width=4) (actual time=1471.907..1471.907 rows=1 loops=1)
-> Subquery Scan on temp (cost=7467.63..107375.65 rows=191301 width=4) (actual time=1366.131..1459.419 rows=164788 loops=1)
-> HashSetOp Intersect (cost=7467.63..105462.64 rows=191301 width=4) (actual time=1366.129..1439.781 rows=164788 loops=1)
-> Append (cost=7467.63..103326.69 rows=854380 width=4) (actual time=69.436..797.478 rows=1829408 loops=1)
-> Subquery Scan on "*SELECT* 2" (cost=7467.63..48180.53 rows=351876 width=4) (actual time=69.435..350.031 rows=830569 loops=1)
-> Bitmap Heap Scan on cc (cost=7467.63..44661.77 rows=351876 width=4) (actual time=69.434..264.538 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=64.162..64.162 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
-> Subquery Scan on "*SELECT* 1" (cost=10667.56..55146.16 rows=502504 width=4) (actual time=69.846..331.544 rows=998839 loops=1)
-> Bitmap Heap Scan on cc (cost=10667.56..50121.12 rows=502504 width=4) (actual time=69.845..233.794 rows=998839 loops=1)
Recheck Cond: ((cad_id = 101) AND (value = 5))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=64.501..64.501 rows=998839 loops=1)
Index Cond: ((cad_id = 101) AND (value = 5))
Total runtime: 1477.484 ms
(15 rows)

首先，INTERSECT不是并行執行的。兩個子查詢分別花費了300ms以上的時間，加起來有800ms是用在搜集contact_id上了。其次，集合之間的并集操作花費了600多ms。其余的時間都花在了數集合的成員個數上了。有一點值得注意的是，根據 http://postgresql.1045698.n5.nabble.com/ANTI-JOIN-needs-table-index-scan-not-possible-td3425340.html index scan和heap scan實際上都用上了索引。然而，SELECT還是在其之上額外耗費了100ms，我猜測它可能是回到原始的表結構中把行取出來，以獲得contact_id的值。這在磁盤上的話速度會更慢，因為會是random seek操作。
看完了INTERSECT，讓我們再來分析分析Join：

taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=180366.31..180366.32 rows=1 width=4) (actual time=1280.964..1280.964 rows=1 loops=1)
-> Merge Join (cost=174783.00..180104.02 rows=104912 width=4) (actual time=1043.879..1270.197 rows=164788 loops=1)
Merge Cond: (a1.contact_id = a2.contact_id)
-> Sort (cost=97705.18..98961.44 rows=502504 width=4) (actual time=586.735..626.292 rows=998839 loops=1)
Sort Key: a1.contact_id
Sort Method: quicksort Memory: 71397kB
-> Bitmap Heap Scan on cc a1 (cost=10667.56..50121.12 rows=502504 width=4) (actual time=75.377..257.403 rows=998839 loops=1)
Recheck Cond: ((cad_id = 101) AND (value = 5))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=69.565..69.565 rows=998839 loops=1)
Index Cond: ((cad_id = 101) AND (value = 5))
-> Sort (cost=77077.83..77957.52 rows=351876 width=4) (actual time=457.131..492.428 rows=830569 loops=1)
Sort Key: a2.contact_id
Sort Method: quicksort Memory: 63509kB
-> Bitmap Heap Scan on cc a2 (cost=7467.63..44661.77 rows=351876 width=4) (actual time=60.039..186.422 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=54.929..54.929 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
Total runtime: 1286.846 ms
(18 rows)

貌似大部分時間都花在了排序上。而且它也不是并行執行的。如果后臺同時執行top命令的話，就會發現只有一個核是被實際占用著的。
要是再創建更多的索引呢？會不會有幫助？

taowen@dmright-perf:~$ time psql postgres -c 'create index cad_id_value_contact_id on cc(cad_id, value, contact_id);'CREATE INDEX

real 0m10.683s
user 0m0.020s
sys 0m0.004s
taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=180366.31..180366.32 rows=1 width=4) (actual time=1338.478..1338.478 rows=1 loops=1)
-> Merge Join (cost=174783.00..180104.02 rows=104912 width=4) (actual time=1090.136..1327.312 rows=164788 loops=1)
Merge Cond: (a1.contact_id = a2.contact_id)
-> Sort (cost=97705.18..98961.44 rows=502504 width=4) (actual time=615.843..657.438 rows=998839 loops=1)
Sort Key: a1.contact_id
Sort Method: quicksort Memory: 71397kB
-> Bitmap Heap Scan on cc a1 (cost=10667.56..50121.12 rows=502504 width=4) (actual time=80.926..275.431 rows=998839 loops=1)
Recheck Cond: ((cad_id = 101) AND (value = 5))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=75.816..75.816 rows=998839 loops=1)
Index Cond: ((cad_id = 101) AND (value = 5))
-> Sort (cost=77077.83..77957.52 rows=351876 width=4) (actual time=474.279..510.866 rows=830569 loops=1)
Sort Key: a2.contact_id
Sort Method: quicksort Memory: 63509kB
-> Bitmap Heap Scan on cc a2 (cost=7467.63..44661.77 rows=351876 width=4) (actual time=65.335..198.655 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=60.314..60.314 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
Total runtime: 1346.587 ms
(18 rows)

還是不行！Merge Join慢的話，升級到9.1然后強制使用hash join會不會好一點？

postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=902356.85..902356.86 rows=1 width=4) (actual time=1693.137..1693.137 rows=1 loops=1)
-> Hash Join (cost=59599.78..902135.88 rows=88389 width=4) (actual time=461.788..1682.718 rows=164788 loops=1)
Hash Cond: (a1.contact_id = a2.contact_id)
-> Bitmap Heap Scan on cc a1 (cost=10833.24..50406.32 rows=510472 width=4) (actual time=76.337..224.571 rows=998839 loops=1)
Recheck Cond: ((cad_id = 101) AND (value = 5))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10705.62 rows=510472 width=0) (actual time=71.028..71.028 rows=998839 loops=1)
Index Cond: ((cad_id = 101) AND (value = 5))
-> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=385.256..385.256 rows=830569 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 29200kB
-> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=64.778..258.059 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=59.675..59.675 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
Total runtime: 1698.207 ms
(14 rows)

有兩個原因造成這樣的情況：
1、兩個條件意味著掃描兩次，而且不是并行掃描
2、join自身很耗費時間，無論是sort merge join還是hash join

兩個條件都這熊樣了，三個條件呢？看看吧，先上INTERSECT：

postgres@dmright-perf:/mnt/memory/pg$ time psql postgres -c 'select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7 intersect select contact_id from cc where cad_id = 6 and value = 1) as temp'
count
-------
6748
(1 row)

real 0m1.350s
user 0m0.020s
sys 0m0.008s

然后是INNER JOIN

postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id inner join cc as a3 on a1.contact_id = a3.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';
count
-------
6748
(1 row)

real 0m0.756s
user 0m0.028s
sys 0m0.000s

然后是IN + SUBQUERY：

postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'select count(a1.contact_id) from cc as a1 where a1.cad_id = 101 and a1.value = 5 and a1.contact_id in (select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7 and a2.contact_id in (select contact_id from cc as a3 where a3.cad_id = 6 and a3.value = 1))';
count
-------
6748
(1 row)

real 0m7.320s
user 0m0.024s
sys 0m0.004s

我們可以看到這回INNER JOIN是最快的了。為什么？

postgres@dmright-perf:/mnt/memory/pg$ time psql postgres -c 'explain analyze select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7 intersect select contact_id from cc where cad_id = 6 and value = 1) as temp'
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=132357.21..132357.22 rows=1 width=4) (actual time=1659.666..1659.666 rows=1 loops=1)
-> Subquery Scan on temp (cost=306.85..132335.71 rows=8602 width=4) (actual time=1648.818..1659.243 rows=6748 loops=1)
-> HashSetOp Intersect (cost=306.85..132249.69 rows=8602 width=4) (actual time=1648.817..1658.543 rows=6748 loops=1)
-> Append (cost=306.85..131691.89 rows=223118 width=4) (actual time=33.797..1547.811 rows=372016 loops=1)
-> Subquery Scan on "*SELECT* 3" (cost=306.85..26138.13 rows=14239 width=4) (actual time=33.797..153.230 rows=207228 loops=1)
-> Bitmap Heap Scan on cc (cost=306.85..25995.74 rows=14239 width=4) (actual time=33.795..134.292 rows=207228 loops=1)
Recheck Cond: ((cad_id = 6) AND (value = 1))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=28.558..28.558 rows=207228 loops=1)
Index Cond: ((cad_id = 6) AND (value = 1))
-> Result (cost=7341.27..105553.76 rows=208879 width=4) (actual time=1282.426..1371.255 rows=164788 loops=1)
-> HashSetOp Intersect (cost=7341.27..105553.76 rows=208879 width=4) (actual time=1282.418..1353.422 rows=164788 loops=1)
-> Append (cost=7341.27..103413.10 rows=856264 width=4) (actual time=57.550..734.282 rows=1829408 loops=1)
-> Subquery Scan on "*SELECT* 2" (cost=7341.27..47902.07 rows=345792 width=4) (actual time=57.550..292.595 rows=830569 loops=1)
-> Bitmap Heap Scan on cc (cost=7341.27..44444.15 rows=345792 width=4) (actual time=57.549..218.162 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=52.606..52.606 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
-> Subquery Scan on "*SELECT* 1" (cost=10833.24..55511.04 rows=510472 width=4) (actual time=69.129..330.737 rows=998839 loops=1)
-> Bitmap Heap Scan on cc (cost=10833.24..50406.32 rows=510472 width=4) (actual time=69.128..242.416 rows=998839 loops=1)
Recheck Cond: ((cad_id = 101) AND (value = 5))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10705.62 rows=510472 width=0) (actual time=64.161..64.161 rows=998839 loops=1)
Index Cond: ((cad_id = 101) AND (value = 5))
Total runtime: 1665.691 ms
(23 rows)

對于INTERSECT來說，過程和兩個條件是差不多的，只是集合更大一些罷了。

postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id inner join cc as a3 on a1.contact_id = a3.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=118979.47..118979.48 rows=1 width=4) (actual time=771.393..771.393 rows=1 loops=1)
-> Nested Loop (cost=49073.39..118977.89 rows=630 width=4) (actual time=399.835..770.587 rows=6748 loops=1)
-> Hash Join (cost=49073.39..97159.97 rows=2466 width=8) (actual time=399.814..691.519 rows=34219 loops=1)
Hash Cond: (a3.contact_id = a2.contact_id)
-> Bitmap Heap Scan on cc a3 (cost=306.85..25995.74 rows=14239 width=4) (actual time=25.655..83.526 rows=207228 loops=1)
Recheck Cond: ((cad_id = 6) AND (value = 1))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=20.570..20.570 rows=207228 loops=1)
Index Cond: ((cad_id = 6) AND (value = 1))
-> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=373.969..373.969 rows=830569 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 29200kB
-> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=59.271..250.932 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=54.030..54.030 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
-> Index Scan using cad_id_value_contact_id_idx on cc a1 (cost=0.00..8.83 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=34219)
Index Cond: ((cad_id = 101) AND (value = 5) AND (contact_id = a2.contact_id))
Total runtime: 774.588 ms
(17 rows)

對于INNER JOIN，索引都被利用上了。它不需要取得contact_id然后再來做集合操作。結論是對rowid做hash操作比集合操作更快。
不顯示用INNER JOIN，讓Planner決定Join順序也是一樣的：

postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1, cc as a2, cc as a3 where a1.contact_id = a2.contact_id and a2.contact_id = a3.contact_id and a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=118979.47..118979.48 rows=1 width=4) (actual time=762.969..762.970 rows=1 loops=1)
-> Nested Loop (cost=49073.39..118977.89 rows=630 width=4) (actual time=398.554..762.206 rows=6748 loops=1)
-> Hash Join (cost=49073.39..97159.97 rows=2466 width=8) (actual time=398.531..684.425 rows=34219 loops=1)
Hash Cond: (a3.contact_id = a2.contact_id)
-> Bitmap Heap Scan on cc a3 (cost=306.85..25995.74 rows=14239 width=4) (actual time=34.802..91.672 rows=207228 loops=1)
Recheck Cond: ((cad_id = 6) AND (value = 1))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=29.524..29.524 rows=207228 loops=1)
Index Cond: ((cad_id = 6) AND (value = 1))
-> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=363.537..363.537 rows=830569 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 29200kB
-> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=57.799..245.467 rows=830569 loops=1)
Recheck Cond: ((cad_id = 102) AND (value = 7))
-> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=52.349..52.349 rows=830569 loops=1)
Index Cond: ((cad_id = 102) AND (value = 7))
-> Index Scan using cad_id_value_contact_id_idx on cc a1 (cost=0.00..8.83 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=34219)
Index Cond: ((cad_id = 101) AND (value = 5) AND (contact_id = a2.contact_id))
Total runtime: 766.107 ms
(17 rows)

結果是一樣的。

基本上對PostgreSQL的實驗就到這里了，差不多也就這樣了，提高空間不大。Google之后發現，PostgreSQL使用的MVCC機制導致其甚至在做SELECT COUNT這樣的操作的時候也會去更新hint bit。也許這就是其慢的重要原因。MySQL的MYISAM引擎以完全不負責事務和著稱，應該會比PostgreSQL有更大的提升，下一篇中，我們將對MySQL重復同樣的實驗。

轉載于:https://www.cnblogs.com/taowen/archive/2012/02/26/2367724.html

總結

以上是生活随笔為你收集整理的多维数据查询效率分析（1）的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： 2012-02-25工作记录
下一篇：【转】VS2008制作打包程序将安装路径

3atv精品不卡视频,97人人超碰国产精品最新,中文字幕av一区二区三区人妻少妇,久久久精品波多野结衣,日韩一区二区三区精品

编程问答

多维数据查询效率分析（1）

總結