Retrieving image license and author information in wiki commons

☆樱花仙子☆ 提交于 2019-12-04 00:42:47
Philippe Green

Late answer but you can request the "extmetadata" data with the following query:

http://en.wikipedia.org/w/api.php?action=query&prop=imageinfo&iiprop=extmetadata&titles=File%3aBrad_Pitt_at_Incirlik2.jpg&format=json

Look under imageinfo.extmetadata.UsageTerms, Artist, Credit, etc.

You could try using Magnus Manske's Commons API tool on the Wikimedia Toolserver. It's not an official service, and the documentation seem to be rather sparse (that is to say, almost nonexistent), but the XML output seems pretty self-explanatory.

I can't seem to find the source for Magnus's script anywhere, but I assume it extracts the licensing information from the categories the file belongs to. If you wanted, you could do that yourself: just fetch the list of categories and, if necessary, walk up the category tree until you find a license category you recognize. Alas, the tree-walking part requires either multiple API requests or a database of Commons categories (either live access on the Toolserver, or a reconstructed copy from the database dumps).

Yes, I realize that this answer may seem unsatisfactory. The fact is that Magnus's script seems to be the closest currently existing thing to what you want, and even it's marked as experimental and incomplete. Basically, this is a problem waiting for someone to implement a (better) solution.

I've used Magnus' Commons API tool. It's not designed to be just dropped into a project, but if you copy the source of the wiki page it calls and cache it locally, then move the logic into a class you can make it more easily callable. Here's the source for Magnus' version. If you want the class I created from it let me know and I'll dig it out.

From http://www.mediawiki.org/wiki/API_talk:Main_page#Image_license_information Is there a way to get the license of an image through the api? By category is probably easiest, assuming the site categorizes by license. There is no built in module though for license information. Splarka 08:45, 22 January 2010 (UTC)

However, I find that using categories doesn't return anything for many images even though they have a license specified. Maybe the best way is to parse the rendered html of the image page.

have a look at Mediawiki and try this function:

import json, requests
def extract_image_license(image_name):

    start_of_end_point_str = 'https://commons.wikimedia.org' \
                         '/w/api.php?action=query&titles=File:'
    end_of_end_point_str = '&prop=imageinfo&iiprop=user' \
                       '|userid|canonicaltitle|url|extmetadata&format=json'
    result = requests.get(start_of_end_point_str + image_name+end_of_end_point_str)
    result = result.json()
    page_id = next(iter(result['query']['pages']))
    image_info = result['query']['pages'][page_id]['imageinfo']

    return image_info

then you call the function and pass in the image name you want to query for example:

extract_image_license('Albert_Einstein_Head.jpg')

see page: http://www.mediawiki.org/wiki/API:Meta

You can use foreach image the tag 'meta=siteinfo' and the tag 'siprop=rightsinfo' (siprop is the prop of the siteinfo) Then you will see the rightsinfo of the picture.

In your case of Brad Pitt it would be like:

http://en.wikipedia.org/w/api.php?format=jsonfm&action=query&titles=File:Brad_Pitt_at_Incirlik2.jpg&prop=imageinfo&iiprop=url&meta=siteinfo&siprop=rightsinfo

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!