问题
I'm trying to get the most famous movies in the world from Wikidata with SPARQL.
I have the following query:
SELECT ?item WHERE {
?item wdt:P31 wd:Q11424.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
which returns ALL movies (about 214143).
I basically only need movies that have, let's say, more than 10 language entries on wikipedia, as I'm guessing these will be the most famous ones.
Is there a way to do this inside the query itself, without checking all entries ?
回答1:
A naive answer to your question is:
SELECT ?movie (count(?wikipage) AS ?count) WHERE {
hint:Query hint:optimizer "None" .
?movie wdt:P31 wd:Q11424 .
?wikipage schema:about ?movie .
?wikipage schema:isPartOf/wikibase:wikiGroup "wikipedia"
} GROUP BY ?movie HAVING (?count > 10) ORDER BY DESC(?count)
Try it!
Alternatively, you could consider total number of sitelinks. Sitelinks include links to Wikipedia and also links to Wikiquote, Wikivoyage etc. The advantage is that total number of sitelinks is precomputed.
SELECT ?movie ?sitelinks WHERE {
?movie wdt:P31 wd:Q11424 .
?movie wikibase:sitelinks ?sitelinks .
FILTER (?sitelinks > 10)
} ORDER BY DESC(?sitelinks)
Try it!
See also these questions:
- Get Wikipedia URLs (sitelinks) in Wikidata SPARQL query
- Wikidata results sorted by something similar to a PageRank
As @TallTed and @AKSW have pointed out, the number of labels in different languages may be differ from the number of Wikipedia articles in different languages. Here below a comparison.
Top 5 movies by Wikipedia articles
| title | articles | sitelinks | labels |
|---------------------|----------|-----------|--------|
| Avatar | 92 | 103 | 99 |
| Titanic | 86 | 100 | 101 |
| The Godfather | 79 | 103 | 82 |
| Slumdog Millionaire | 72 | 75 | 80 |
| Forrest Gump | 71 | 101 | 84 |
Top 5 movies by sitelinks
| title | articles | sitelinks | labels |
|---------------|----------|-----------|--------|
| Avatar | 92 | 103 | 99 |
| The Godfather | 79 | 103 | 82 |
| Forrest Gump | 71 | 101 | 84 |
| Titanic | 86 | 100 | 101 |
| The Matrix | 67 | 94 | 77 |
Top 5 movies by labels
| title | articles | sitelinks | labels |
|------------------------------|----------|-----------|--------|
| The 25th Reich | 2 | 2 | 227 |
| Time Is But Brief | 0 | 0 | 224 |
| Michael Moore in TrumpLand | 6 | 6 | 222 |
| Magnus - The Mozart of Chess | 1 | 1 | 221 |
| Lee Chong Wei | 1 | 1 | 196 |
来源:https://stackoverflow.com/questions/48478064/get-all-wikidata-items-that-have-more-than-10-languages