How to get with Mediawiki API all images in a category which are not in another one?

…衆ロ難τιáo~ 提交于 2020-01-04 03:56:09

问题


I am entirely new to API, so sorry if the question is silly.

I would like to get all images in a category in Commons let's say X, but exclude those which are also in another one (Y). I do not understand if I can actually do this.

https://commons.wikimedia.org/w/api.php?action=query&list=categorymembers&cmtype=file&cmtitle=Category:X

will get all of them, how to exclude some?

moreover I would like in the result to have the description of the images, not just the name of the file, is that possible?


回答1:


MediaWiki has - by default - no built-in support for category building and querying intersections. To accomplish this task, extensions or external tools or multiple API queries and result processing is required.

CirrusSearch API

On Wikimedia Commons, like on the whole Wikimedia Wiki farm, CirrusSearch powers filtered search, including search for category intersections and is also available through API (action=query&list=search&srsearch=incategory:A+-incategory:B, this is Category:A minus Category:B).

FastCCI

One of the tools I can recommend (because it's a dedicated high-performance solution and actually running) is fastcci, developed by Daniel Schwen; specifically for Wikimedia Commons, there is already a database maintained and a webservice running but it's possible to set it up for any wiki, provided the tool set has a host to run on and has database access.

Query

Consider the following query URL:

https://fastcci.wmflabs.org/?c1=3302993&c2=15516712&d1=0&d2=0&s=200&a=not&t=js

  • https://fastcci.wmflabs.org/ - Host Wikimedia Commons fastcci runs on
  • c1 - ID of category 1
  • c2 - ID of category 2
  • d1 - depth of category 1 to search in (fastcci by default considers sub-categories)
  • d2 - depth of category 2 to search in (fastcci by default considers sub-categories)
  • s - Number or results to return
  • o - Offset
  • a - conjunction
  • t - connection type (t=js for a JSONP response; otherwise assumes being used as websocket)

Response

fastcciCallback( [ 'RESULT 27572680,0,0|1675043,0,0|27577015,0,0|27577043,0,0|27577106,0,0|27576896,0,0|27576790,0,0|23481936,0,0|17560964,0,0|11009066,0,0', 'OUTOF 10', 'DBAGE 378310', 'DONE'] );

RESULT followed by a | separated list of up to 50 integer triplets of the form pageId,depth,tag. Each triplet stands for one image or category

Resources

  • Sample client side implementation - to see it in action, just visit any category and next to the Good pictures button in any category page.
    • Example is FilesOf('Category:Saaleck') - FilesOf('Category:Rapeseed fields in Saxony-Anhalt')
  • Server application
  • Presentation on YouTube
  • Slides

A note on pageIDs

  • page IDs → page titles: GET /w/api.php?action=query&pageids=page_IDs_separated_by_pipe
  • page titles → page IDs: GET /w/api.php?action=query&titles=Titles_separated_by_pipe



回答2:


AFAIK, there is no way to get that directly using the API. But, assuming both categories are reasonably small, you could get all images from both of them and then compute the complement in your code.

To retrieve the description, you can use prop=imageinfo&iiprop=extmetadata&iiextmetadatafilter=ImageDescription.

In the context of your example query, it would look like this:

https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtype=file&gcmtitle=Category:X&prop=imageinfo&iiprop=extmetadata&iiextmetadatafilter=ImageDescription



来源:https://stackoverflow.com/questions/27433744/how-to-get-with-mediawiki-api-all-images-in-a-category-which-are-not-in-another

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!