The right way to hydrate lots of entities in py2neo

大兔子大兔子 提交于 2020-01-13 14:59:06

问题


this is more of a best-practices question. I am implementing a search back-end for highly structured data that, in essence, consists of ontologies, terms, and a complex set of mappings between them. Neo4j seemed like a natural fit and after some prototyping I've decided to go with py2neo as a way to communicate with neo4j, mostly because of nice support for batch operations. This is more of a best practices question than anything.

What I'm getting frustrated with is that I'm having trouble with introducing the types of higher-level abstraction that I would like to in my code - I'm stuck with either using the objects directly as a mini-orm, but then I'm making lots and lots of atomic rest calls, which kills performance (I have a fairly large data set).

What I've been doing is getting my query results, using get_properties on them to batch-hydrate my objects, which preforms great and which is why I went down this route in the first place, but this makes me pass tuples of (node, properties) around in my code, which gets the job done, but isn't pretty. at all.

So I guess what I'm asking is if there's a best practice somewhere for working with a fairly rich object graph in py2neo, getting the niceties of an ORM-like later while retaining performance (which in my case means doing as much as possible as batch queries)


回答1:


I am not sure whether I understand what you want, but I had a similar issue. I wanted to make a lot of calls and create a lot of nodes, indexes and relationships.. (around 1.2 million) . Here is an example of adding nodes, relationships, indexes and labels in batches using py2neo

from py2neo import neo4j, node, rel
gdb = neo4j.GraphDatabaseService("<url_of_db>")
batch = neo4j.WriteBatch(gdb)

a = batch.create(node(name='Alice'))
b = batch.create(node(name='Bob'))

batch.set_labels(a,"Female")
batch.set_labels(b,"Male")

batch.add_indexed_node("Name","first_name","alice",a) #this will create an index 'Name' if it does not exist
batch.add_indexed_node("Name","first_name","bob",b) 

batch.create(rel(a,"KNOWS",b)) #adding a relationship in batch

batch.submit() #this will now listen to the db and submit the batch records. Ideally around 2k-5k records should be sent 



回答2:


Since your asking for best practice, here is an issue I ran into:

When adding a lot of nodes (~1M) with py2neo in a batch, my program often gets slow or crashes when the neo4j server runs out of memory. As a workaround, I split the submit in multiple batches:

from py2neo import neo4j

def chunker(seq, size):
    """
    Chunker gets a list and returns slices 
    of the input list with the given size.
    """
    for pos in xrange(0, len(seq), size):
        yield seq[pos:pos + size]


def submit(graph_db, list_of_elements, size):
    """
    Batch submit lots of nodes.
    """

    # chunk data
    for chunk in chunker(list_of_elements, size):

        batch = neo4j.WriteBatch(graph_db)

        for element in chunk:
            n = batch.create(element)
            batch.add_labels(n, 'Label')

        # submit batch for chunk
        batch.submit()
        batch.clear()

I tried this with different chunk sizes. For me, it's fastest with ~1000 nodes per batch. But I guess this depends on the RAM/CPU of your neo4j server.



来源:https://stackoverflow.com/questions/15865397/the-right-way-to-hydrate-lots-of-entities-in-py2neo

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!