Java n-triple RDF parsing

前提是你 提交于 2019-12-03 12:40:09

If you just want to parse the NTriples and don't need to do anything other than basic processing and querying then you could try the NxParser. It is a very simple bit of Java code that'll pass any NTriples like format (so NQuads etc) which gives you an iterator over the statements in the file. If you only want NTriples you can easily ignore statements with less/more than 3 items.

Adapting the example on the linked page would give the following simple code:

NxParser nxp = new NxParser(new FileInputStream("filetoparse.nq"),false);

while (nxp.hasNext()) 
{
  Node[] ns = nxp.next();
  if (ns.length == 3)
  {
    //Only Process Triples  
    //Replace the print statements with whatever you want
    for (Node n: ns) 
    {
      System.out.print(n.toN3());
      System.out.print(" ");
    }
    System.out.println(".");
  }
}

With Jena it is not so difficult:

Given a file rdfexample.ntriple containing the following RDF in N-TRIPLE form (example taken from here):

<http://www.recshop.fake/cd/Hide your heart> <http://www.recshop.fake/cd#year> "1988" .
<http://www.recshop.fake/cd/Hide your heart> <http://www.recshop.fake/cd#price> "9.90" .
<http://www.recshop.fake/cd/Hide your heart> <http://www.recshop.fake/cd#company> "CBS Records" .
<http://www.recshop.fake/cd/Hide your heart> <http://www.recshop.fake/cd#country> "UK" .
<http://www.recshop.fake/cd/Hide your heart> <http://www.recshop.fake/cd#artist> "Bonnie Tyler" .
<http://www.recshop.fake/cd/Empire Burlesque> <http://www.recshop.fake/cd#year> "1985" .
<http://www.recshop.fake/cd/Empire Burlesque> <http://www.recshop.fake/cd#price> "10.90" .
<http://www.recshop.fake/cd/Empire Burlesque> <http://www.recshop.fake/cd#company> "Columbia" .
<http://www.recshop.fake/cd/Empire Burlesque> <http://www.recshop.fake/cd#country> "USA" .
<http://www.recshop.fake/cd/Empire Burlesque> <http://www.recshop.fake/cd#artist> "Bob Dylan" .

the following code

public static void main(String[] args) {
    String fileNameOrUri = "src/a/rdfexample.ntriple";
    Model model = ModelFactory.createDefaultModel();
    InputStream is = FileManager.get().open(fileNameOrUri);
    if (is != null) {
        model.read(is, null, "N-TRIPLE");
        model.write(System.out, "TURTLE");
    } else {
        System.err.println("cannot read " + fileNameOrUri);;
    }
}

reads the file, and prints it out in TURTLE form:

<http://www.recshop.fake/cd/Hide your heart>
      <http://www.recshop.fake/cd#artist>
              "Bonnie Tyler" ;
      <http://www.recshop.fake/cd#company>
              "CBS Records" ;
      <http://www.recshop.fake/cd#country>
              "UK" ;
      <http://www.recshop.fake/cd#price>
              "9.90" ;
      <http://www.recshop.fake/cd#year>
              "1988" .

<http://www.recshop.fake/cd/Empire Burlesque>
      <http://www.recshop.fake/cd#artist>
              "Bob Dylan" ;
      <http://www.recshop.fake/cd#company>
              "Columbia" ;
      <http://www.recshop.fake/cd#country>
              "USA" ;
      <http://www.recshop.fake/cd#price>
              "10.90" ;
      <http://www.recshop.fake/cd#year>
              "1985" .

So, with Jena you can easily parse RDF (in any form) into a com.hp.hpl.jena.rdf.model.Model object, which allows you to programmatically manipulate it.

Old question, but since you explicitly ask about different libraries, I'd thought I'd show how to do simple RDF parsing with Eclipse RDF4J's Rio parser (disclosure: I am one of the RDF4J developers).

For example, to parse the file and put all triples in a Model, just do this:

FileInputStream in = new FileInputStream("/path/to/file.nt");

Model m = Rio.parse(in, RDFFormat.NTRIPLES);

If you want to immediately print the parser output to stdout (for example in Turtle format), do something like this:

FileInputStream in = new FileInputStream("/path/to/file.nt");

RDFParser parser = Rio.createParser(RDFFormat.NTRIPLES);
parser.parse(in, "", Rio.createWriter(RDFFormat.TURTLE, System.out));

And of course there are more ways to play with these basic tools, have a look at the toolkit documentation for details.

The Rio parsers are available as separate maven artifacts by the way, so if you wish to use only the parsers, without the rest of the RDF4J tools, you can do so.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!