Import Freebase to Triplestore

我怕爱的太早我们不能终老 提交于 2019-12-03 12:53:08

问题


I'm currently planning a big project containing big data.

I already used the search and all results tell me that it's not possible to import Freebase into any triplestore without usage of 3rd Party Tools like BaseKB or Freebase to RDF

As I can see, the dump is already available as RDF, so where is the problem if I want to import the dump into my 4store triplestore and access the data via SPARQL?


回答1:


For everybody having Problems importing the Freebase Dump:

1) Keep your RDF/Turtle Parser updated. (Latest Version of raptor 2 can recognize the '.', e.g. at ns:common.topic.notable_for.example

2) The dump must be cleaned up before you can import it. I used this scipt: http://people.apache.org/~andy/Freebase20121223/ (fixit)

3) The Turtle specification only allows these characters for URIs:

::= '<' ([^#x00-#x20<>\"{}|^`\] | UCHAR)* '>'

So it's very important to add this line to the fixit script at line 80:

$X =~ s/\\>/%3E/g ;
$X =~ s/\\.//g ;

# Add this Line
$X =~ [\x00-\x20\<\>\"\{\}\|\^\`] ;

$obj = "<".$X.">" ;

As a result, invalid syntax like this:

<http://www.wikipedia.org/object?key={invalid_braces}>

becomes

<http://www.wikipedia.org/object?key=invalid_braces>



回答2:


You are probably getting search results from at least two, if not three, different data sets:

  1. the old quad format dump
  2. the early RDF dumps
  3. (perhaps) the current RDF dump

The format in #1 is what required conversion. The early RDF dumps (#2) were syntactically invalid, so wouldn't import to most tools. The RDF dump has been improving over time. I'm not sure whether it's still true that it won't import at all without preprocessing, but, regardless, it'll almost be more useful if you pre-process it to remove redundancy, normalize to the format that works best for your application, etc.

Did you try importing the current dump? What were your results?




回答3:


The problem with freebase turtle dump is this, they are not COMPLIANT with w3c turtle specification.

1) according to http://www.w3.org/TR/turtle/#sec-grammar, character '.' can only appear at the end of the triple, however freebase dump has lots of '.' before end of the triple. I read somewhere that "/" is not allowed as well outside uri, so they instead chose to use '.'

latest raptor2 library can get around this ('.'), but not the older ones

2) I think the way emit "blank node" is also not valid for e.g. line 141567 ns:m.01000m1 ns:common.topic.notable_for .



来源:https://stackoverflow.com/questions/17760747/import-freebase-to-triplestore

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!