Find main category for article using Wikipedia API

时光总嘲笑我的痴心妄想 提交于 2019-12-06 06:44:05

问题


I have a list of articles and I want to find the main category of each article.

Wikipedia lists its main categories here - http://en.wikipedia.org/wiki/Portal:Contents/Categories.

I am able to find the subcategories of each article using:

http://en.wikipedia.org/w/api.php?action=query&prop=categories&titles=%s&format=xml

I also am able to check whether a subcategory is within a category:

http://en.wikipedia.org/w/api.php?action=query&titles=Dog&prop=categories&clcategories=Domesticated animals&format=xml

This will tell me whether "domesticated animals" is a subcategory of Dog, but this is not quite what I want. I want to be able to check which main category 'domesticated animals' is in. Is this possible using the API?


回答1:


First, there is no such thing as a "Wikipedia API". There is a MediaWiki (web) API. Knowing this will help you find information on the existing tools. https://www.mediawiki.org/wiki/API:Main_Page

Which tells you there is no API which will do all the category recursion for you. Why? Because 1) it's extremely inefficient, 2) the recursion might go anywhere or never end.

However there is a solution now, by Magnus Manske: https://tools.wmflabs.org/catscan2/reverse_tree.php?doit=1&language=en&project=wikipedia&title=Dog&namespace=0 "Maximum depth: 61 levels Total categories along the way : 7988" Using that definition, the "root" category for [[Dog]], i.e. the farthest father category, is "Industry by country". Probably not what you expected! However, from the English Wikipedia's perspective the root category for any article is always the same, [[Category:Contents]].



来源:https://stackoverflow.com/questions/25573810/find-main-category-for-article-using-wikipedia-api

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!