mongodb插入速度每秒_MongoDB事实:商品硬件上每秒插入80000次以上
mongodb插入速度每秒
在嘗試一些時間序列集合時,我需要一個大型數據集來檢查我們的聚合查詢在增加數據負載的情況下不會成為瓶頸。 我們解決了5000萬份文檔,因為超出此數目我們仍然會考慮分片。
每次事件如下所示:
當我們想要獲得隨機值時,我們考慮使用JavaScript或Python生成它們(我們可以在Java中進行嘗試,但是我們希望盡快編寫它)。 我們不知道哪個會更快,所以我們決定對其進行測試。
我們的第一次嘗試是通過MongoDB Shell運行一個JavaScript文件。
看起來是這樣的:
var minDate = new Date(2012, 0, 1, 0, 0, 0, 0); var maxDate = new Date(2013, 0, 1, 0, 0, 0, 0); var delta = maxDate.getTime() - minDate.getTime();var job_id = arg2;var documentNumber = arg1; var batchNumber = 5 * 1000;var job_name = 'Job#' + job_id var start = new Date();var batchDocuments = new Array(); var index = 0;while(index < documentNumber) {var date = new Date(minDate.getTime() + Math.random() * delta);var value = Math.random();var document = { created_on : date,value : value};batchDocuments[index % batchNumber] = document;if((index + 1) % batchNumber == 0) {db.randomData.insert(batchDocuments);}index++;if(index % 100000 == 0) { print(job_name + ' inserted ' + index + ' documents.');} } print(job_name + ' inserted ' + documentNumber + ' in ' + (new Date() - start)/1000.0 + 's');這是我們運行它的方式以及所獲得的:
mongo random --eval "var arg1=50000000;arg2=1" create_random.js Job#1 inserted 100000 documents. Job#1 inserted 200000 documents. Job#1 inserted 300000 documents. ... Job#1 inserted 49900000 documents. Job#1 inserted 50000000 in 566.294s好吧,這已經超出了我的期望(每秒88293次插入)。
現在輪到Python了。 您將需要安裝pymongo才能正確運行它。
import sys import os import pymongo import time import randomfrom datetime import datetimemin_date = datetime(2012, 1, 1) max_date = datetime(2013, 1, 1) delta = (max_date - min_date).total_seconds()job_id = '1'if len(sys.argv) < 2:sys.exit("You must supply the item_number argument") elif len(sys.argv) > 2:job_id = sys.argv[2] documents_number = int(sys.argv[1]) batch_number = 5 * 1000;job_name = 'Job#' + job_id start = datetime.now();# obtain a mongo connection connection = pymongo.Connection("mongodb://localhost", safe=True)# obtain a handle to the random database db = connection.random collection = db.randomDatabatch_documents = [i for i in range(batch_number)];for index in range(documents_number):try: date = datetime.fromtimestamp(time.mktime(min_date.timetuple()) + int(round(random.random() * delta)))value = random.random()document = {'created_on' : date, 'value' : value, }batch_documents[index % batch_number] = documentif (index + 1) % batch_number == 0:collection.insert(batch_documents) index += 1;if index % 100000 == 0: print job_name, ' inserted ', index, ' documents.' except:print 'Unexpected error:', sys.exc_info()[0], ', for index ', indexraise print job_name, ' inserted ', documents_number, ' in ', (datetime.now() - start).total_seconds(), 's'我們運行它,這是我們這次得到的:
python create_random.py 50000000 Job#1 inserted 100000 documents. Job#1 inserted 200000 documents. Job#1 inserted 300000 documents. ... Job#1 inserted 49900000 documents. Job#1 inserted 50000000 in 1713.501 s與JavaScript版本(每秒插入29180次)相比,它要慢一些,但不要氣lets。 Python是一種功能齊全的編程語言,因此如何利用我們所有的CPU內核(例如4個內核)并為每個內核啟動一個腳本,每個腳本插入總文檔數的一小部分(例如12500000)。
import sys import pymongo import time import subprocess import multiprocessingfrom datetime import datetimecpu_count = multiprocessing.cpu_count()# obtain a mongo connection connection = pymongo.Connection('mongodb://localhost', safe=True)# obtain a handle to the random database db = connection.random collection = db.randomDatatotal_documents_count = 50 * 1000 * 1000; inserted_documents_count = 0 sleep_seconds = 1 sleep_count = 0for i in range(cpu_count):documents_number = str(total_documents_count/cpu_count)print documents_numbersubprocess.Popen(['python', '../create_random.py', documents_number, str(i)])start = datetime.now();while (inserted_documents_count < total_documents_count) is True:inserted_documents_count = collection.count()if (sleep_count > 0 and sleep_count % 60 == 0): print 'Inserted ', inserted_documents_count, ' documents.' if (inserted_documents_count < total_documents_count):sleep_count += 1time.sleep(sleep_seconds) print 'Inserting ', total_documents_count, ' took ', (datetime.now() - start).total_seconds(), 's'運行并行執行Python腳本是這樣的:
python create_random_parallel.py Job#3 inserted 100000 documents. Job#2 inserted 100000 documents. Job#0 inserted 100000 documents. Job#1 inserted 100000 documents. Job#3 inserted 200000 documents. ... Job#2 inserted 12500000 in 571.819 s Job#0 inserted 12400000 documents. Job#3 inserted 10800000 documents. Job#1 inserted 12400000 documents. Job#0 inserted 12500000 documents. Job#0 inserted 12500000 in 577.061 s Job#3 inserted 10900000 documents. Job#1 inserted 12500000 documents. Job#1 inserted 12500000 in 578.427 s Job#3 inserted 11000000 documents. ... Job#3 inserted 12500000 in 623.999 s Inserting 50000000 took 624.655 s這確實非常好(每秒插入80044次),即使它仍比第一次JavaScript導入慢。 因此,讓我們修改最后一個Python腳本,以通過多個MongoDB Shell運行JavaScript。
由于我無法為mongo命令以及由主python腳本啟動的子進程提供必需的參數,因此我提出了以下替代方案:
for i in range(cpu_count):documents_number = str(total_documents_count/cpu_count)script_name = 'create_random_' + str(i + 1) + '.bat'script_file = open(script_name, 'w')script_file.write('mongo random --eval "var arg1=' + documents_number +';arg2=' + str(i + 1) +'" ../create_random.js');script_file.close()subprocess.Popen(script_name)我們動態生成shell腳本,然后讓python為我們運行它們。
Job#1 inserted 100000 documents. Job#4 inserted 100000 documents. Job#3 inserted 100000 documents. Job#2 inserted 100000 documents. Job#1 inserted 200000 documents. ... Job#4 inserted 12500000 in 566.438s Job#3 inserted 12300000 documents. Job#2 inserted 10800000 documents. Job#1 inserted 11600000 documents. Job#3 inserted 12400000 documents. Job#1 inserted 11700000 documents. Job#2 inserted 10900000 documents. Job#1 inserted 11800000 documents. Job#3 inserted 12500000 documents. Job#3 inserted 12500000 in 574.782s Job#2 inserted 11000000 documents. Job#1 inserted 11900000 documents. Job#2 inserted 11100000 documents. Job#1 inserted 12000000 documents. Job#2 inserted 11200000 documents. Job#1 inserted 12100000 documents. Job#2 inserted 11300000 documents. Job#1 inserted 12200000 documents. Job#2 inserted 11400000 documents. Job#1 inserted 12300000 documents. Job#2 inserted 11500000 documents. Job#1 inserted 12400000 documents. Job#2 inserted 11600000 documents. Job#1 inserted 12500000 documents. Job#1 inserted 12500000 in 591.073s Job#2 inserted 11700000 documents. ... Job#2 inserted 12500000 in 599.005s Inserting 50000000 took 599.253 s這也很快(每秒83437次插入),但仍然無法擊敗我們的第一次嘗試。
結論
我的PC配置與眾不同,唯一的優化是我有一個運行MongoDB的SSD驅動器。
第一次嘗試產生了最佳結果,并且監視CPU資源后,我意識到MongoDB甚至可以在單個Shell控制臺中利用所有這些資源。 在所有內核上運行的Python腳本也足夠快,并且它的優點是允許我們根據需要將該腳本轉換為可完全運行的應用程序。
- 代碼可在GitHub上獲得 。
翻譯自: https://www.javacodegeeks.com/2013/12/mongodb-facts-80000-insertssecond-on-commodity-hardware.html
mongodb插入速度每秒
總結
以上是生活随笔為你收集整理的mongodb插入速度每秒_MongoDB事实:商品硬件上每秒插入80000次以上的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 无服务器安全性:将其置于自动驾驶仪上
- 下一篇: SAR合成孔径雷达距离多普勒(RD)算法