How can I get a list of all film ids from Freebase?

陌路散爱 提交于 2020-01-01 05:30:12

问题


On a project I was working on a couple of years back, I was building a set of data about movies from Freebase. A simple shell script downloaded the "film.tsv" file (from http://download.freebase.com/datadumps/latest/browse/film/film.tsv). I then used the "id" field in that file to build the necessary MQL requests for each of the films (retrieving the other properties I was interested in e.g. actors, genres).

After looking at the developer's guide today I realise that Freebase has moved on a fair bit and significantly I see that the dump file I used before is no longer available. I also see that the dump file format is now RDF and from what I can tell the dump files are now only available as a single 22GB archive.

If at all possible I would like to avoid downloading a 22G file each time I want to rebuild my data set so is it possible to retrieve individual dump files anymore e.g. like the film.tsv file?

If not is there an alternative way to obtain a full list of movie ids?


回答1:


There's no replacement planned for film.tsv right now. You can get the current list of film IDs from the RDF dump like this:

zgrep $'\ttype\.object\.type\tfilm\.film' freebase-rdf.gz

Then when you need to update the list you query the MQL Read API for a list of new films that have been added since your last update:

[{
  "type": "/film/film",
  "id": null,
  "name": null,
  "timestamp": null,
  "timestamp>=": "2013-12",
  "sort": "-timestamp"
}]

Since the API returns 200 results at a time you'll need to use a cursor to get the full list of results.




回答2:


You can try MQL by just opening the following link.

https://www.googleapis.com/freebase/v1/mqlread?query=[{%22type%22:%20%22/film/film%22,%22id%22:%20null,%22limit%22:300}]&cursor=

You will have to make many requests though.

At each response you receive a cursor that you use as parameter for cursor= at the next request. AFAIK the default limit is 200. You can't increase the limit at will. Maybe the query can be optimized so that the response does not contain the type.

You can edit the query here http://tinyurl.com/pn5o52w At the top right corner you have a 'link' button with a 'MQLRead link' shows you the url to execute. I added the 'cursor=' parameter manually. I thought the query editor offers an option for this but I couldn't find it.



来源:https://stackoverflow.com/questions/20353337/how-can-i-get-a-list-of-all-film-ids-from-freebase

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!