Get all Wikipedia Infobox Templates and all Pages using them

旧巷老猫 提交于 2019-11-30 12:43:22

The previous answers seem to have stopped working. Only a small change is required to get them working at the new dbpedia query endpoint at http://live.dbpedia.org/sparql though.

To get a list of all of the pages and the templates that they use this query works:

SELECT * WHERE {  ?page  dbpprop:wikiPageUsesTemplate ?template . }

See results (limited to 100)

If you're looking for a specific template:

SELECT * WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
}

See results

And for my use case I'm interested in the Wikipedia URL rather than the DBPedia page, so I'm using this query:

SELECT ?wikipedia_url WHERE {  
   ?page  
   dbpprop:wikiPageUsesTemplate 
   <http://dbpedia.org/resource/Template:Infobox_website> . 
   ?page foaf:isPrimaryTopicOf ?wikipedia_url .
}

See results

I'm also using curl to pull the results into a script:

$ curl -s "http://live.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=SELECT+%3Fwikipedia_url+WHERE+%7B+%0D%0A%09+%3Fpage+%0D%0A%09+dbpprop%3AwikiPageUsesTemplate+%0D%0A%09+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FTemplate%3AInfobox_website%3E+.+%0D%0A+%3Fpage+foaf%3AisPrimaryTopicOf+%3Fwikipedia_url+.%0D%0A%0D%0A%09%7D&format=text%2Ftab-separated-values" \
| tr -d \" | grep -v "^wikipedia_url$" | head
http://en.wikipedia.org/wiki/U.S._News_&_World_Report
http://en.wikipedia.org/wiki/FriendFinder
http://en.wikipedia.org/wiki/Debkafile
http://en.wikipedia.org/wiki/GTPlanet
http://en.wikipedia.org/wiki/Lithuanian_Wikipedia
http://en.wikipedia.org/wiki/Connexions
http://en.wikipedia.org/wiki/Hypno5ive
http://en.wikipedia.org/wiki/Scoop_(website)
http://en.wikipedia.org/wiki/Bhoomi_(software)
http://en.wikipedia.org/wiki/Brainwashed_(website)

I'm not sure if this gives the full result set though, because it returns 1698 results whereas wmflabs.org seems to suggest there should be 4439.


For the second part of your question, only a small change is needed from the previous query to get a list of all templates:

SELECT DISTINCT ?template WHERE { 
    ?page  
    dbpprop:wikiPageUsesTemplate  
    ?template . 
    FILTER (regex(?template, "Infobox")) . 
} ORDER BY ?template

See results

Ok, since i seem to have found a solution (most probably not the best) i want to share them.

1) This SPARQL query can be used to find all pages that include a specific Infobox type:

SELECT * WHERE { ?page dbpedia2:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_website> . ?page dbpedia2:name ?name . }

Link at SNORQL


2) This SPARQL query can be used to find all Infobox types:

SELECT DISTINCT ?template WHERE { ?page dbpedia2:wikiPageUsesTemplate ?template . FILTER (regex(?template, "Infobox")) . } ORDER BY ?template

Link at SNORQL

You can also use the MediaWiki API's embeddedin query to return a list of all pages that include a given template. You'll want to use a library for accessing the API though, which language would you prefer? For Ruby, I'd suggest MediaWiki::Gateway.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!