text file of all titles / topic titles in Freebase

微笑、不失礼 提交于 2019-12-10 11:17:49

问题


I need a text file to contain every title / title of each topic / title of each item in a .txt file each on its own line.

How can I do this or make this if I have already downloaded a freebase rdf dump?

If possible, I also need a separate text file with each topic's / item's description on a single line each description on its own line.

How can I do that?

I would greatly appreciate it if someone could help me make either of these files from a Freebase rdf dump.

Thanks in Advance!


回答1:


Filter the RDF dump on the predicate/property ns:type.object.name. If you only want a particular language, also filter by that language e.g. @en.

EDIT: I missed the second part about descriptions being desired as well. Here's a three part regex which will get you all the lines with:

  1. English names
  2. English descriptions
  3. a type of /commmon/topic

Combining the three is left as an exercise for the reader.

zegrep $'\tns:(((type\\.object\\.name|common\\.topic\\.description)\t.*@en)|type\\.object\\.type\tns:common\\.topic)\\.$' freebase-rdf-2013-06-30-00-00.gz | gzip > freebase-rdf-2013-06-30-00-00-names-descriptions.gz

It seems to have a performance issue that I'll have to look at. A simple grep of the entire file takes ~11 min on my laptop, but this has been running several times that. I'll have to look at it later though...



来源:https://stackoverflow.com/questions/18263401/text-file-of-all-titles-topic-titles-in-freebase

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!