Wikipedia API: search for famous people

前端 未结 4 1530
醉话见心
醉话见心 2021-02-20 01:41

I have the following Wikipedia API search query:

http://en.wikipedia.org/w/api.php?&action=query&generator=search&gsrnamespace=0&gsrlimit=20&prop=pag

相关标签:
4条回答
  • 2021-02-20 02:22

    There isn't an exact way to limit your search results to only famous people. However, you can use a few different filters in with Wikipedia's CirrusSearch to roughly narrow your results to people:

    • incategory: Can you find a category that includes the people you want? Categories may not be a great solution, since they may be inconveniently specific.
    • linksto: Do articles about people link to a common article?
    • hastemplate: Can you find a template that is used on biographies of famous people? The template {{birth date}} may be a good solution (if it's fine to limit your search to mostly non-fictional people with non-disputed known birthdates).

    For example, see your same search result with hastemplate:Birth_date to see people:

    https://en.wikipedia.org/w/api.php?&action=query&generator=search&gsrnamespace=0&gsrlimit=20&prop=pageimages|extracts&pilimit=max&exintro&exsentences=1&exlimit=max&continue&pithumbsize=100&gsrsearch=hastemplate%3ABirth_date+Albert%20Einstein

    {
    "batchcomplete": "",
    "continue": {
        "gsroffset": 20,
        "continue": "gsroffset||"
    },
    "query": {
        "pages": {
            "92733": {
                "pageid": 92733,
                "ns": 0,
                "title": "Albert A. Michelson",
                "index": 14,
                "thumbnail": {
                    "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Albert_Abraham_Michelson2.jpg/71px-Albert_Abraham_Michelson2.jpg",
                    "width": 71,
                    "height": 100
                },
                "pageimage": "Albert_Abraham_Michelson2.jpg",
                "extract": "<p><b>Albert Abraham Michelson</b> (surname pronunciation anglicized as \"Michael-son\", December 19, 1852 \u2013 May 9, 1931) was an American physicist known for his work on the measurement of the speed of light and especially for the Michelson\u2013Morley experiment.</p>"
            },
            "736": {
                "pageid": 736,
                "ns": 0,
                "title": "Albert Einstein",
                "index": 1,
                "thumbnail": {
                    "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Einstein_1921_by_F_Schmutzer_-_restoration.jpg/76px-Einstein_1921_by_F_Schmutzer_-_restoration.jpg",
                    "width": 76,
                    "height": 100
                },
                "pageimage": "Einstein_1921_by_F_Schmutzer_-_restoration.jpg",
                "extract": "<p><b>Albert Einstein</b> (<span><span>/<span><span title=\"/\u02c8/ primary stress follows\">\u02c8</span><span title=\"/a\u026a/ long 'i' in 'tide'\">a\u026a</span><span title=\"'n' in 'no'\">n</span><span title=\"'s' in 'sigh'\">s</span><span title=\"'t' in 'tie'\">t</span><span title=\"/a\u026a/ long 'i' in 'tide'\">a\u026a</span><span title=\"'n' in 'no'\">n</span></span>/</span></span>; <small>German:</small> <span title=\"Representation in the International Phonetic Alphabet (IPA)\">[\u02c8alb\u025b\u0250\u032ft \u02c8a\u026an\u0283ta\u026an]</span>; 14 March 1879&#160;\u2013 18 April 1955) was a German-born theoretical physicist.</p>"
            },
            "1139788": {
                "pageid": 1139788,
                "ns": 0,
                "title": "Alfred Einstein",
                "index": 6,
                "thumbnail": {
                    "source": "https://upload.wikimedia.org/wikipedia/en/thumb/1/12/Alfred_Einstein.jpg/70px-Alfred_Einstein.jpg",
                    "width": 70,
                    "height": 100
                },
                "pageimage": "Alfred_Einstein.jpg",
                "extract": "<p><b>Alfred Einstein</b> (December 30, 1880&#160;\u2013 February 13, 1952) was a German-American musicologist and music editor.</p>"
            },
    
            ...
    

    Someday, you should be able to use Wikidata to search for entities on Wikipedia that are an instance of human. For now, we'll have to work with search filters.

    0 讨论(0)
  • 2021-02-20 02:23

    There is two urls to search famous peoples :

    https://en.wikipedia.org/w/api.php?action=query&generator=search&format=json&exintro&exsentences=1&exlimit=max&gsrlimit=20&gsrsearch=hastemplate:Birth_date_and_age+Melanie_laurent&pithumbsize=100&pilimit=max&prop=pageimages%7Cextracts
    https://en.wikipedia.org/w/api.php?action=query&generator=search&format=json&exintro&exsentences=1&exlimit=max&gsrlimit=20&gsrsearch=hastemplate:Birth_date+Melanie_laurent&pithumbsize=100&pilimit=max&prop=pageimages%7Cextracts
    

    The only difference between both url is gsrsearch parameter :

    To get people alive you have to use hastemplate:Birth_date_and_age

    To get dead people you have to use hastemplate:Birth_date

    In my case, i have to do two requests.

    In this example url, juste replace Melanie_laurent by your query.

    0 讨论(0)
  • 2021-02-20 02:33

    I think all persons will have ... birthDate) (if still alive) or birthDate - died) in the first line of the extract. So I guess you can filter only records with an extract matching this regex:

    ^[^.]*\d{4}\)[^.]*\..*
    

    Which will only match texts with something like 2001) in the first row.

    If it's safe to assume that other records don't have it (I'm not sure that it is), then you can stop there. If not, at least you filtered a few more records before checking the revision.

    0 讨论(0)
  • 2021-02-20 02:38

    My workaround for now is to filter search results server-side, by only showing articles that have birth_date in their revision content.

    The bounty is still available if someone finds a way around this.

    0 讨论(0)
提交回复
热议问题