Wikipedia API: search for famous people

可紊 提交于 2019-12-05 16:03:59

问题


I have the following Wikipedia API search query:

http://en.wikipedia.org/w/api.php?&action=query&generator=search&gsrnamespace=0&gsrlimit=20&prop=pageimages|extracts&pilimit=max&exintro&exsentences=1&exlimit=max&continue&pithumbsize=100&gsrsearch=Albert%20Einstein

I just want to list famous people - is there a way to do that?


回答1:


There isn't an exact way to limit your search results to only famous people. However, you can use a few different filters in with Wikipedia's CirrusSearch to roughly narrow your results to people:

  • incategory: Can you find a category that includes the people you want? Categories may not be a great solution, since they may be inconveniently specific.
  • linksto: Do articles about people link to a common article?
  • hastemplate: Can you find a template that is used on biographies of famous people? The template {{birth date}} may be a good solution (if it's fine to limit your search to mostly non-fictional people with non-disputed known birthdates).

For example, see your same search result with hastemplate:Birth_date to see people:

https://en.wikipedia.org/w/api.php?&action=query&generator=search&gsrnamespace=0&gsrlimit=20&prop=pageimages|extracts&pilimit=max&exintro&exsentences=1&exlimit=max&continue&pithumbsize=100&gsrsearch=hastemplate%3ABirth_date+Albert%20Einstein

{
"batchcomplete": "",
"continue": {
    "gsroffset": 20,
    "continue": "gsroffset||"
},
"query": {
    "pages": {
        "92733": {
            "pageid": 92733,
            "ns": 0,
            "title": "Albert A. Michelson",
            "index": 14,
            "thumbnail": {
                "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Albert_Abraham_Michelson2.jpg/71px-Albert_Abraham_Michelson2.jpg",
                "width": 71,
                "height": 100
            },
            "pageimage": "Albert_Abraham_Michelson2.jpg",
            "extract": "<p><b>Albert Abraham Michelson</b> (surname pronunciation anglicized as \"Michael-son\", December 19, 1852 \u2013 May 9, 1931) was an American physicist known for his work on the measurement of the speed of light and especially for the Michelson\u2013Morley experiment.</p>"
        },
        "736": {
            "pageid": 736,
            "ns": 0,
            "title": "Albert Einstein",
            "index": 1,
            "thumbnail": {
                "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Einstein_1921_by_F_Schmutzer_-_restoration.jpg/76px-Einstein_1921_by_F_Schmutzer_-_restoration.jpg",
                "width": 76,
                "height": 100
            },
            "pageimage": "Einstein_1921_by_F_Schmutzer_-_restoration.jpg",
            "extract": "<p><b>Albert Einstein</b> (<span><span>/<span><span title=\"/\u02c8/ primary stress follows\">\u02c8</span><span title=\"/a\u026a/ long 'i' in 'tide'\">a\u026a</span><span title=\"'n' in 'no'\">n</span><span title=\"'s' in 'sigh'\">s</span><span title=\"'t' in 'tie'\">t</span><span title=\"/a\u026a/ long 'i' in 'tide'\">a\u026a</span><span title=\"'n' in 'no'\">n</span></span>/</span></span>; <small>German:</small> <span title=\"Representation in the International Phonetic Alphabet (IPA)\">[\u02c8alb\u025b\u0250\u032ft \u02c8a\u026an\u0283ta\u026an]</span>; 14 March 1879&#160;\u2013 18 April 1955) was a German-born theoretical physicist.</p>"
        },
        "1139788": {
            "pageid": 1139788,
            "ns": 0,
            "title": "Alfred Einstein",
            "index": 6,
            "thumbnail": {
                "source": "https://upload.wikimedia.org/wikipedia/en/thumb/1/12/Alfred_Einstein.jpg/70px-Alfred_Einstein.jpg",
                "width": 70,
                "height": 100
            },
            "pageimage": "Alfred_Einstein.jpg",
            "extract": "<p><b>Alfred Einstein</b> (December 30, 1880&#160;\u2013 February 13, 1952) was a German-American musicologist and music editor.</p>"
        },

        ...

Someday, you should be able to use Wikidata to search for entities on Wikipedia that are an instance of human. For now, we'll have to work with search filters.




回答2:


My workaround for now is to filter search results server-side, by only showing articles that have birth_date in their revision content.

The bounty is still available if someone finds a way around this.




回答3:


I think all persons will have ... birthDate) (if still alive) or birthDate - died) in the first line of the extract. So I guess you can filter only records with an extract matching this regex:

^[^.]*\d{4}\)[^.]*\..*

Which will only match texts with something like 2001) in the first row.

If it's safe to assume that other records don't have it (I'm not sure that it is), then you can stop there. If not, at least you filtered a few more records before checking the revision.




回答4:


There is two urls to search famous peoples :

https://en.wikipedia.org/w/api.php?action=query&generator=search&format=json&exintro&exsentences=1&exlimit=max&gsrlimit=20&gsrsearch=hastemplate:Birth_date_and_age+Melanie_laurent&pithumbsize=100&pilimit=max&prop=pageimages%7Cextracts
https://en.wikipedia.org/w/api.php?action=query&generator=search&format=json&exintro&exsentences=1&exlimit=max&gsrlimit=20&gsrsearch=hastemplate:Birth_date+Melanie_laurent&pithumbsize=100&pilimit=max&prop=pageimages%7Cextracts

The only difference between both url is gsrsearch parameter :

To get people alive you have to use hastemplate:Birth_date_and_age

To get dead people you have to use hastemplate:Birth_date

In my case, i have to do two requests.

In this example url, juste replace Melanie_laurent by your query.



来源:https://stackoverflow.com/questions/30418378/wikipedia-api-search-for-famous-people

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!