How to improve performance for massive MERGE insert?

别说谁变了你拦得住时间么 提交于 2019-12-07 13:30:52

问题


I'm trying to insert data from my SQL db into Neo4J. I have a CSV file where every row generates 4-5 entities and some relations between them. Entities might be duplicate between rows and I want to force uniqueness.

What I currently do is:

  • create constraints for each label to force uniqueness.
  • iterate the CSV:
    • start transaction
    • create merge statements for the entities
    • create merge statements for the relations
    • commit transaction

I got bad results. Then I tried to commit the transaction every X rows (X was 100, 500, 1000 and 5000). It's better now but I still have 2 problems:

  • it's slow. on average around 1-1.5 seconds per 100 rows. (row = 4-5 entities and 4-5 relations).
  • it's getting worse as I keep adding data. I usually start with 400-500 ms per 100 rows and after ~5000 rows I'm at ~4-5 seconds per 100 rows.

From what I know, my constraint also creates an index for that field. That's the field that is used when I create the new node with MERGE. Any chance it doesn't use the index?

What's the best practice for improving performance? I saw BatchInserter but wasn't sure if I can use it with MERGE operations.

Thanks

来源:https://stackoverflow.com/questions/22129783/how-to-improve-performance-for-massive-merge-insert

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!