Neo4j:Cypher –避免热切
當(dāng)心渴望的管道
盡管我喜歡Cypher的LOAD CSV命令使它容易地將數(shù)據(jù)獲取到Neo4j中的方法,但它目前打破了最不驚奇的規(guī)則,因為它急切地在所有行中加載某些查詢,即使是那些使用定期提交的查詢。
這是我的同事Michael在第二篇博客文章中指出的,它解釋了如何成功使用LOAD CSV :
即使遵循我之前的建議,人們遇到的最大問題是,對于超過一百萬行的大量導(dǎo)入,Cypher遇到了內(nèi)存不足的情況。
這與提交大小無關(guān) ,因此即使使用小批量的PERIODIC COMMIT也會發(fā)生。
最近,我花了幾天的時間將數(shù)據(jù)導(dǎo)入具有4GB RAM的Windows機(jī)器上的Neo4j中,所以我發(fā)現(xiàn)這個問題的時間甚至早于Michael的建議。
Michael解釋了如何確定您的查詢是否遭受意外的急切評估:
如果分析該查詢,則會看到查詢計劃中有一個“急切”步驟。
那就是“拉入所有數(shù)據(jù)”的地方。
您可以通過在單詞“ PROFILE”前面加上前綴來配置查詢。 您需要在Web瀏覽器的/ webadmin控制臺中或使用Neo4j shell運行查詢。
我為查詢執(zhí)行了此操作,并且能夠識別得到快速評估的查詢模式,在某些情況下,我們可以解決該問題。
我們將使用Northwind數(shù)據(jù)集來演示Eager管道如何潛入我們的查詢,但是請記住,該數(shù)據(jù)集足夠小,不會引起問題。
文件中的行如下所示:
$ head -n 2 data/customerDb.csv OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,CustomerID,CustomerCompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,EmployeeID,LastName,FirstName,Title,TitleOfCourtesy,BirthDate,HireDate,Address,City,Region,PostalCode,Country,HomePhone,Extension,Photo,Notes,ReportsTo,PhotoPath,OrderID,ProductID,UnitPrice,Quantity,Discount,ProductID,ProductName,SupplierID,CategoryID,QuantityPerUnit,UnitPrice,UnitsInStock,UnitsOnOrder,ReorderLevel,Discontinued,SupplierID,SupplierCompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,HomePage,CategoryID,CategoryName,Description,Picture 10248,VINET,5,1996-07-04,1996-08-01,1996-07-16,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France,VINET,Vins et alcools Chevalier,Paul Henriot,Accounting Manager,59 rue de l'Abbaye,Reims,,51100,France,26.47.15.10,26.47.15.11,5,Buchanan,Steven,Sales Manager,Mr.,1955-03-04,1993-10-17,14 Garrett Hill,London,,SW1 8JR,UK,(71) 555-4848,3453,\x,"Steven Buchanan graduated from St. Andrews University, Scotland, with a BSC degree in 1976. Upon joining the company as a sales representative in 1992, he spent 6 months in an orientation program at the Seattle office and then returned to his permanent post in London. He was promoted to sales manager in March 1993. Mr. Buchanan has completed the courses ""Successful Telemarketing"" and ""International Sales Management."" He is fluent in French.",2,http://accweb/emmployees/buchanan.bmp,10248,11,14,12,0,11,Queso Cabrales,5,4,1 kg pkg.,21,22,30,30,0,5,Cooperativa de Quesos 'Las Cabras',Antonio del Valle Saavedra,Export Administrator,Calle del Rosal 4,Oviedo,Asturias,33007,Spain,(98) 598 76 54,,,4,Dairy Products,Cheeses,\x合并,合并,合并
我們要做的第一件事是為每個員工和每個訂單創(chuàng)建一個節(jié)點,然后在它們之間創(chuàng)建一個關(guān)系。
我們可以從以下查詢開始:
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MERGE (employee:Employee {employeeId: row.EmployeeID}) MERGE (order:Order {orderId: row.OrderID}) MERGE (employee)-[:SOLD]->(order)這樣就可以了,但是如果我們像這樣對查詢進(jìn)行概要分析……
PROFILE LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row WITH row LIMIT 0 MERGE (employee:Employee {employeeId: row.EmployeeID}) MERGE (order:Order {orderId: row.OrderID}) MERGE (employee)-[:SOLD]->(order)…我們會在第三行看到“渴望”:
==> +----------------+------+--------+----------------------------------+-----------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+----------------------------------+-----------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph(0) | 0 | 0 | employee, order, UNNAMED216 | MergePattern | ==> | Eager | 0 | 0 | | | ==> | UpdateGraph(1) | 0 | 0 | employee, employee, order, order | MergeNode; :Employee; MergeNode; :Order | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+----------------------------------+-----------------------------------------+您會注意到,當(dāng)我們分析每個查詢時,我們將刪除定期提交部分,并添加“ WITH row LIMIT 0”。 這使我們能夠生成足夠的查詢計劃來標(biāo)識“急切”運算符,而無需實際導(dǎo)入任何數(shù)據(jù)。
我們希望將該查詢分為兩個查詢,以便可以不急于處理它:
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row WITH row LIMIT 0 MERGE (employee:Employee {employeeId: row.EmployeeID}) MERGE (order:Order {orderId: row.OrderID})==> +-------------+------+--------+----------------------------------+-----------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +-------------+------+--------+----------------------------------+-----------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | employee, employee, order, order | MergeNode; :Employee; MergeNode; :Order | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +-------------+------+--------+----------------------------------+-----------------------------------------+現(xiàn)在我們已經(jīng)創(chuàng)建了員工和訂單,我們可以將他們加入在一起:
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MATCH (employee:Employee {employeeId: row.EmployeeID}) MATCH (order:Order {orderId: row.OrderID}) MERGE (employee)-[:SOLD]->(order)==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | employee, order, UNNAMED216 | MergePattern | ==> | Filter(0) | 0 | 0 | | Property(order,orderId) == Property(row,OrderID) | ==> | NodeByLabel(0) | 0 | 0 | order, order | :Order | ==> | Filter(1) | 0 | 0 | | Property(employee,employeeId) == Property(row,EmployeeID) | ==> | NodeByLabel(1) | 0 | 0 | employee, employee | :Employee | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+眼中沒有渴望!
比賽,比賽,比賽,合并,合并
如果我們快進(jìn)幾步,我們現(xiàn)在可能已經(jīng)將導(dǎo)入腳本重構(gòu)到了在一個查詢中創(chuàng)建節(jié)點并在另一個查詢中創(chuàng)建關(guān)系的地步。
我們的create查詢按預(yù)期工作:
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MERGE (employee:Employee {employeeId: row.EmployeeID}) MERGE (order:Order {orderId: row.OrderID}) MERGE (product:Product {productId: row.ProductID})==> +-------------+------+--------+----------------------------------------------------+--------------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +-------------+------+--------+----------------------------------------------------+--------------------------------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | employee, employee, order, order, product, product | MergeNode; :Employee; MergeNode; :Order; MergeNode; :Product | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +-------------+------+--------+----------------------------------------------------+------------------------------------------------------------現(xiàn)在,我們在圖表中有了員工,產(chǎn)品和訂單。 現(xiàn)在,讓我們創(chuàng)建三者之間的關(guān)系:
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MATCH (employee:Employee {employeeId: row.EmployeeID}) MATCH (order:Order {orderId: row.OrderID}) MATCH (product:Product {productId: row.ProductID}) MERGE (employee)-[:SOLD]->(order) MERGE (order)-[:PRODUCT]->(product)如果我們描述一下,我們會發(fā)現(xiàn)Eager再次潛入了!
==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph(0) | 0 | 0 | order, product, UNNAMED318 | MergePattern | ==> | Eager | 0 | 0 | | | ==> | UpdateGraph(1) | 0 | 0 | employee, order, UNNAMED287 | MergePattern | ==> | Filter(0) | 0 | 0 | | Property(product,productId) == Property(row,ProductID) | ==> | NodeByLabel(0) | 0 | 0 | product, product | :Product | ==> | Filter(1) | 0 | 0 | | Property(order,orderId) == Property(row,OrderID) | ==> | NodeByLabel(1) | 0 | 0 | order, order | :Order | ==> | Filter(2) | 0 | 0 | | Property(employee,employeeId) == Property(row,EmployeeID) | ==> | NodeByLabel(2) | 0 | 0 | employee, employee | :Employee | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+在這種情況下,“急切”發(fā)生在我們第二次致電MERGE時,正如Michael在他的帖子中指出的那樣:
問題是,在單個Cypher語句中,您必須隔離會進(jìn)一步影響匹配的更改,例如,當(dāng)您創(chuàng)建帶有標(biāo)簽的節(jié)點時,該標(biāo)簽突然被以后的MATCH或MERGE操作所匹配。
在這種情況下,我們可以通過使用單獨的查詢來創(chuàng)建關(guān)系來解決該問題:
LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MATCH (employee:Employee {employeeId: row.EmployeeID}) MATCH (order:Order {orderId: row.OrderID}) MERGE (employee)-[:SOLD]->(order)==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | employee, order, UNNAMED236 | MergePattern | ==> | Filter(0) | 0 | 0 | | Property(order,orderId) == Property(row,OrderID) | ==> | NodeByLabel(0) | 0 | 0 | order, order | :Order | ==> | Filter(1) | 0 | 0 | | Property(employee,employeeId) == Property(row,EmployeeID) | ==> | NodeByLabel(1) | 0 | 0 | employee, employee | :Employee | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+-------------------------------+-----------------------------------------------------------+USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MATCH (order:Order {orderId: row.OrderID}) MATCH (product:Product {productId: row.ProductID}) MERGE (order)-[:PRODUCT]->(product)==> +----------------+------+--------+------------------------------+--------------------------------------------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+------------------------------+--------------------------------------------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | order, product, UNNAMED229 | MergePattern | ==> | Filter(0) | 0 | 0 | | Property(product,productId) == Property(row,ProductID) | ==> | NodeByLabel(0) | 0 | 0 | product, product | :Product | ==> | Filter(1) | 0 | 0 | | Property(order,orderId) == Property(row,OrderID) | ==> | NodeByLabel(1) | 0 | 0 | order, order | :Order | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+------------------------------+--------------------------------------------------------+合并,設(shè)置
我嘗試使LOAD CSV腳本盡可能地冪等,這樣,如果我們將更多行或更多列的數(shù)據(jù)添加到CSV中,我們可以重新運行查詢而不必重新創(chuàng)建所有內(nèi)容。
這可以引導(dǎo)您進(jìn)入以下創(chuàng)建供應(yīng)商的模式:
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MERGE (supplier:Supplier {supplierId: row.SupplierID}) SET supplier.companyName = row.SupplierCompanyName我們要確保只有一個具有該SupplierID的Supplier,但是我們可能會逐步添加新屬性,并決定僅使用'SET'命令替換所有內(nèi)容。 如果我們分析該查詢,則“渴望”會潛伏:
==> +----------------+------+--------+--------------------+----------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +----------------+------+--------+--------------------+----------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph(0) | 0 | 0 | | PropertySet | ==> | Eager | 0 | 0 | | | ==> | UpdateGraph(1) | 0 | 0 | supplier, supplier | MergeNode; :Supplier | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +----------------+------+--------+--------------------+----------------------+我們可以使用“ ON CREATE SET”和“ ON MATCH SET”以一些重復(fù)的代價來解決此問題:
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:/Users/markneedham/projects/neo4j-northwind/data/customerDb.csv" AS row MERGE (supplier:Supplier {supplierId: row.SupplierID}) ON CREATE SET supplier.companyName = row.SupplierCompanyName ON MATCH SET supplier.companyName = row.SupplierCompanyName==> +-------------+------+--------+--------------------+----------------------+ ==> | Operator | Rows | DbHits | Identifiers | Other | ==> +-------------+------+--------+--------------------+----------------------+ ==> | EmptyResult | 0 | 0 | | | ==> | UpdateGraph | 0 | 0 | supplier, supplier | MergeNode; :Supplier | ==> | Slice | 0 | 0 | | { AUTOINT0} | ==> | LoadCSV | 1 | 0 | row | | ==> +-------------+------+--------+--------------------+----------------------+使用我一直在使用的數(shù)據(jù)集,在某些情況下可以避免OutOfMemory異常,而在其他情況下,可以將運行查詢所花費的時間減少3倍。
隨著時間的流逝,我希望所有這些情況都將得到解決,但是從Neo4j 2.1.5開始,這些是我已經(jīng)確定過急的模式。
如果您知道其他任何人,請告訴我,我可以將其添加到帖子中或撰寫第二部分。
翻譯自: https://www.javacodegeeks.com/2014/10/neo4j-cypher-avoiding-the-eager.html
總結(jié)
以上是生活随笔為你收集整理的Neo4j:Cypher –避免热切的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 网络拒绝接入是什么意思
- 下一篇: 手机电池耗电快的原因是什么