Neo4j slower than MySQL in performing recursive query

匿名 (未验证) 提交于 2019-12-03 08:57:35

问题:

I would like to compare Neo4j(ver. 3.1) and MySQL in performing recursive queries. Therefore I created two tables in MySQL database - Customer and CustomerFriend.

Second table consists of CustomerID and FriendID columns, both of them point to CustomerID column in Customer table. In Neo4j were created corresponding entities:

Customer nodes and FRIEND_OF relations (c:Customer)-[f:FRIEND_OF]->(cc:Customer). Databases are filled with the same data: 100000 Customers, each Customer has 100 relations. Executed below queries:

MySQL (60s)

SELECT distinct cf4.FriendID FROM customerfriend cf1 join customerfriend cf2 on cf1.FriendID = cf2.CustomerID join customerfriend cf3 on cf2.FriendID = cf3.CustomerID join customerfriend cf4 on cf3.FriendID = cf4.CustomerID where cf1.CustomerID =99; 

Neo4j (240s)

match (c:Customer{CustomerID:99})-[:FRIEND_OF*4]->(cc:Customer) return distinct cc.CustomerID; 

Queries are run from simple Java app, which just connect to database (using available connectors), run queries, and measure execution times.

Measured times clearly indicate that Neo4j is slower in performing above queries than MySQL (MySQL 60s, Neo4j 240s). I have tested above queries for 50 relations per Customer and I achieved same results (MySQL 7s faster than Neo4j 17s ).

I read some articles about performing recursive queries in Neo4j which indicate that Neo4j should manage better for this type of queries than MySQL. That's why I have started wondering if I'm doing something wrong or execution times are proper (??).

I'm wondering if in Neo4j exists any possibilities to tune system performance. In case of MySQL I set up innodb_buffer_pool_size to 3g which affected better query performance(shorter execution time).

--------------------------------EDIT---------------------------

I have considered below suggestions to rewrite my Noe4j query to new form:

match (c:Customer{CustomerID:99})-[:FRIEND_OF]->(c1)-[:FRIEND_OF]->(c2) with distinct c2 match (c2)-[:FRIEND_OF]->(c3) with distinct c3 match (c3)-[:FRIEND_OF]->(cc:Customer) with distinct cc return cc.CustomerID; 

And achieved better query time: 40s

In case of MySQL I have figured out way to optimise previous query, similar to idea of Neo4j query optimisation:

select distinct FriendID as depth4 from customerfriend where CustomerID in (select distinct FriendID as depth3 from customerfriend where CustomerID in (select distinct FriendID as depth2 from customerfriend where CustomerID in (select distinct FriendID as depth from customerfriend where CustomerID =99 ))); 

And execution of this query took 24s

Neo4j still worse than MySQL...

回答1:

Can you try:

match (c:Customer{CustomerID:99})-[:FRIEND_OF]->(c1)-[:FRIEND_OF]->(c2) with distinct c2 match (c2)-[:FRIEND_OF]->(c3) with distinct c3 match (c3)-[:FRIEND_OF]->(cc) with distinct cc return cc.CustomerID; 

and share your query plan and the query plan for this query?

Update

To just measure the query time without wire transport, can you try to run this one:

match (c:Customer{CustomerID:99})-[:FRIEND_OF]->(c1)-[:FRIEND_OF]->(c2) with distinct c2 match (c2)-[:FRIEND_OF]->(c3) with distinct c3 match (c3)-[:FRIEND_OF]->(cc) with distinct cc with cc.CustomerID  return count(*); 


回答2:

You can make a small modification to make neo4j about 50% faster, or for even more speed, use the bitset dance shown on the bottom of this blog post => https://maxdemarzi.com/2013/12/31/the-power-of-open-source-software/

Update:

I went ahead and built a custom procedure for you.

You can grab it in the releases tab of https://github.com/maxdemarzi/distinct_network

It takes 2.9 seconds on my laptop with 10002045 relationships.

Second Update:

Wrote a blog post on the subject: https://maxdemarzi.com/2017/02/06/neo4j-is-faster-than-mysql-in-performing-recursive-query/



回答3:

Don't know what version of Neo4j you're running, but this might improve your speed while reducing db hits:

MATCH (c:Customer{CustomerID:99}) MATCH (c)-[:FRIEND_OF*4]->(cc:Customer) return distinct cc.CustomerID; 


回答4:

I'd recommend installing APOC Procedures for this one, the Path Expander functionality is a more efficient means of finding nodes along a path without the extra cost of finding all possible paths.

match (c:Customer{CustomerID:99}) call apoc.path.expandConfig(c, {relationshipFilter:"FRIEND_OF>", minLevel:4, maxLevel:4}) yield path with distinct last(nodes(path)) as cc where cc:Custumer return cc.CustomerID 

EDIT

Looks like NODE_GLOBAL uniqueness won't work in this case. I typically use it when getting all nodes in subgraphs, but it won't apply to this particular case and removed it to use default uniqueness settings.

Unsure how this compares to the equivalent variable-length pattern match. DB hits will be lower, as a procedure call only counts as one db hit and abstracts out the work it's doing, but unsure if it's faster.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!