How to get all article pages under a Wikipedia Category and its sub-categories?

后端 未结 3 718
既然无缘
既然无缘 2020-12-09 09:03

I want to get all the articles names under a category and its sub-categories.

Options I\'m aware of:

  1. Using the Wikipedia API. Does it have such an opt
相关标签:
3条回答
  • 2020-12-09 09:32

    Note that Wikipedia's categorization system is not a tree, or even an acyclic graph. It is quite possible that by continually following subcategory links you will eventually wind up back where you started.

    If you are going to be making many such queries, you would be best served by downloading a database dump. If this will be an infrequent thing and will only be dealing with small categories, you could probably get away with making repeated queries to list=categorymembers.

    incategory:"music" does not appear to do subcategory searching.

    0 讨论(0)
  • 2020-12-09 09:39

    The following resource will help you to download all pages from the category and all its subcategories:

    http://en.wikipedia.org/wiki/Wikipedia:CatScan

    There is also an API available here:

    https://www.mediawiki.org/wiki/API:Categorymembers

    0 讨论(0)
  • 2020-12-09 09:39

    You can do this through the following two API methods:

    For articles pages for this category

    YOUR_URL/api.php?action=query&format=json&list=categorymembers&cmtitle=Category:Music
    

    For get subcategories:

    YOUR_URL/api.php?action=query&format=json&list=categorymembers&cmtype=subcat&cmtitle=Category:Music
    

    You can get more info on Mediawiki API

    0 讨论(0)
提交回复
热议问题