how to parse big datasets using RDFLib?

冷暖自知 提交于 2019-12-02 21:12:06

How many triples on those RDF files ? I have tested rdflib and it won't scale much further than few tens of ktriples - if you are lucky. No way it really performs well for files with millions of triples.

The best parser out there is rapper from Redland Libraries. My first advice is to not use RDF/XML and go for ntriples. Ntriples is a lighter format than RDF/XML. You can transform from RDF/XML to ntriples using rapper:

rapper -i rdfxml -o ntriples YOUR_FILE.rdf > YOUR_FILE.ntriples

If you like Python you can use the Redland python bindings:

import RDF
parser=RDF.Parser(name="ntriples")
model=RDF.Model()
stream=parser.parse_into_model(model,"file://file_path",
                                      "http://your_base_uri.org")
for triple in model:
    print triple.subject, triple.predicate, triple.object

I have parsed fairly big files (couple of gigabyes) with redland libraries with no problem.

Eventually if you are handling big datasets you might need to assert your data into a scalable triple store, the one I normally use is 4store. 4store internally uses redland to parse RDF files. In the long term, I think, going for a scalable triple store is what you'll have to do. And with it you'll be able to use SPARQL to query your data and SPARQL/Update to insert and delete triples.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!