text file of all titles / topic titles in Freebase

落花浮王杯 提交于 2019-12-06 06:23:00

Filter the RDF dump on the predicate/property ns:type.object.name. If you only want a particular language, also filter by that language e.g. @en.

EDIT: I missed the second part about descriptions being desired as well. Here's a three part regex which will get you all the lines with:

  1. English names
  2. English descriptions
  3. a type of /commmon/topic

Combining the three is left as an exercise for the reader.

zegrep $'\tns:(((type\\.object\\.name|common\\.topic\\.description)\t.*@en)|type\\.object\\.type\tns:common\\.topic)\\.$' freebase-rdf-2013-06-30-00-00.gz | gzip > freebase-rdf-2013-06-30-00-00-names-descriptions.gz

It seems to have a performance issue that I'll have to look at. A simple grep of the entire file takes ~11 min on my laptop, but this has been running several times that. I'll have to look at it later though...

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!