How to get all article pages under a Wikipedia Category and its sub-categories?

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-27 02:07:54

问题


I want to get all the articles names under a category and its sub-categories.

Options I'm aware of:

  1. Using the Wikipedia API. Does it have such an option??
  2. d/l the dump. Which format would be better for my usage?
  3. There is also an option to search in Wikipedia something like incategory:"music", but I didn't see an option to view that in XML.

Please share your thoughts


回答1:


The following resource will help you to download all pages from the category and all its subcategories:

http://en.wikipedia.org/wiki/Wikipedia:CatScan

There is also an API available here:

https://www.mediawiki.org/wiki/API:Categorymembers




回答2:


You can do this through the following two API methods:

For articles pages for this category

YOUR_URL/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Music

For get subcategories:

YOUR_URL/api.php?action=query&format=json&list=categorymembers&cmtype=subcat&cmtitle=Category:Music

You can get more info on Mediawiki API




回答3:


Note that Wikipedia's categorization system is not a tree, or even an acyclic graph. It is quite possible that by continually following subcategory links you will eventually wind up back where you started.

If you are going to be making many such queries, you would be best served by downloading a database dump. If this will be an infrequent thing and will only be dealing with small categories, you could probably get away with making repeated queries to list=categorymembers.

incategory:"music" does not appear to do subcategory searching.



来源:https://stackoverflow.com/questions/5771745/how-to-get-all-article-pages-under-a-wikipedia-category-and-its-sub-categories

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!