Importing a large xml file to Neo4j with Py2neo

懵懂的女人 提交于 2019-12-06 18:58:31

I think you should use a streaming parser, otherwise it might be even on the python side that you overflow on memory.

Also I recommend doing transactions in Neo4j with batches of 10k to 100k updates per transaction.

Don't store "NO xxxx" fields, just leave them off it is just a waste of space and effort.

I don't know how merge(node) works. I recommend creating a unique constraint on :User(userId) and using a cypher query like this:

UNWIND {data} as row
MERGE (u:User {userId: row.userId}) ON CREATE SET u += {row}

where {data} parameter is a list (e.g. 10k entries) of dictionaries with the properties.

If you are importing data into a new database you may want to try the import-tool: https://neo4j.com/docs/operations-manual/current/#import-tool

In that case you should parse your XML file as you already do but instead of using py2neo to insert data into Neo4j, just write a CSV file and then call the import-tool afterwards.

See below a possible way to do it:

import csv
from xml.dom import minidom

def getAttribute(node,attribute,default=None):
    attr = node.getElementsByTagName(attribute)[0]
    return attr.firstChild.data if attr.firstChild else default

xml_doc = minidom.parse(open("users.xml"))
persons = xml_doc.getElementsByTagName('user')

users = []
attrs = ['name','screen_name','location','description','profile_image_url','friends_count','url']

mapping = {'user_id': 'user_id:ID(User)',
           'name': 'name:string',
           'screen_name': 'screen_name:string',
           'location': 'location:string',
           'description': 'description:string',
           'profile_image_url': 'profile_image_url:string',
           'friends_count': 'friends_count:int',
           'url': 'url:string'}

with open('users.csv','w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=mapping.values())
    writer.writeheader()
    for person in persons:
        user = {mapping[attr]: getAttribute(person, attr) for attr in attrs}
        user[mapping['user_id']] = getAttribute(person, 'id')

        writer.writerow(user)

Once you have converted the xml to a csv file, run the import-tool:

$ neo4j-import --into neo4j-community-3.0.3/data/databases/users.db --nodes:User users.csv

I guess you will also want to create relationships between nodes (?). You should read the import-tool docs and call the import-tool with csv files for both nodes and relationships

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!