Get all Wikipedia Infobox Templates and all Pages using them

后端 未结 3 502
抹茶落季
抹茶落季 2021-01-02 05:17

Given a Wikipedia page like Wikipedia: Stack Overflow there are often Infoboxes (mostly on the right hand at the top of the page). Example screenshot:

3条回答
  •  青春惊慌失措
    2021-01-02 05:56

    The previous answers seem to have stopped working. Only a small change is required to get them working at the new dbpedia query endpoint at http://live.dbpedia.org/sparql though.

    To get a list of all of the pages and the templates that they use this query works:

    SELECT * WHERE {  ?page  dbpprop:wikiPageUsesTemplate ?template . }
    

    See results (limited to 100)

    If you're looking for a specific template:

    SELECT * WHERE {  
       ?page  
       dbpprop:wikiPageUsesTemplate 
        . 
    }
    

    See results

    And for my use case I'm interested in the Wikipedia URL rather than the DBPedia page, so I'm using this query:

    SELECT ?wikipedia_url WHERE {  
       ?page  
       dbpprop:wikiPageUsesTemplate 
        . 
       ?page foaf:isPrimaryTopicOf ?wikipedia_url .
    }
    

    See results

    I'm also using curl to pull the results into a script:

    $ curl -s "http://live.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwikipedia_url+WHERE+%7B+%0D%0A%09+%3Fpage+%0D%0A%09+dbpprop%3AwikiPageUsesTemplate+%0D%0A%09+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FTemplate%3AInfobox_website%3E+.+%0D%0A+%3Fpage+foaf%3AisPrimaryTopicOf+%3Fwikipedia_url+.%0D%0A%0D%0A%09%7D&format=text%2Ftab-separated-values" \
    | tr -d \" | grep -v "^wikipedia_url$" | head
    http://en.wikipedia.org/wiki/U.S._News_&_World_Report
    http://en.wikipedia.org/wiki/FriendFinder
    http://en.wikipedia.org/wiki/Debkafile
    http://en.wikipedia.org/wiki/GTPlanet
    http://en.wikipedia.org/wiki/Lithuanian_Wikipedia
    http://en.wikipedia.org/wiki/Connexions
    http://en.wikipedia.org/wiki/Hypno5ive
    http://en.wikipedia.org/wiki/Scoop_(website)
    http://en.wikipedia.org/wiki/Bhoomi_(software)
    http://en.wikipedia.org/wiki/Brainwashed_(website)
    

    I'm not sure if this gives the full result set though, because it returns 1698 results whereas wmflabs.org seems to suggest there should be 4439.


    For the second part of your question, only a small change is needed from the previous query to get a list of all templates:

    SELECT DISTINCT ?template WHERE { 
        ?page  
        dbpprop:wikiPageUsesTemplate  
        ?template . 
        FILTER (regex(?template, "Infobox")) . 
    } ORDER BY ?template
    

    See results

提交回复
热议问题