Get all Wikidata items that have more than 10 languages?

人盡茶涼 提交于 2019-12-13 02:36:21

问题


I'm trying to get the most famous movies in the world from Wikidata with SPARQL.

I have the following query:

SELECT ?item WHERE {
  ?item wdt:P31 wd:Q11424.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

which returns ALL movies (about 214143).

I basically only need movies that have, let's say, more than 10 language entries on wikipedia, as I'm guessing these will be the most famous ones.

Is there a way to do this inside the query itself, without checking all entries ?


回答1:


A naive answer to your question is:

SELECT ?movie (count(?wikipage) AS ?count) WHERE {
   hint:Query hint:optimizer "None" .
   ?movie wdt:P31 wd:Q11424 .
   ?wikipage schema:about ?movie .
   ?wikipage schema:isPartOf/wikibase:wikiGroup "wikipedia" 
} GROUP BY ?movie HAVING (?count > 10) ORDER BY DESC(?count)

Try it!

Alternatively, you could consider total number of sitelinks. Sitelinks include links to Wikipedia and also links to Wikiquote, Wikivoyage etc. The advantage is that total number of sitelinks is precomputed.

SELECT ?movie ?sitelinks WHERE {
   ?movie wdt:P31 wd:Q11424 .
   ?movie wikibase:sitelinks ?sitelinks .
   FILTER (?sitelinks > 10) 
} ORDER BY DESC(?sitelinks)

Try it!

See also these questions:

  • Get Wikipedia URLs (sitelinks) in Wikidata SPARQL query
  • Wikidata results sorted by something similar to a PageRank

As @TallTed and @AKSW have pointed out, the number of labels in different languages may be differ from the number of Wikipedia articles in different languages. Here below a comparison.

Top 5 movies by Wikipedia articles

|        title        | articles | sitelinks | labels |
|---------------------|----------|-----------|--------|
| Avatar              |       92 |       103 |     99 |
| Titanic             |       86 |       100 |    101 |
| The Godfather       |       79 |       103 |     82 |
| Slumdog Millionaire |       72 |        75 |     80 |
| Forrest Gump        |       71 |       101 |     84 |

Top 5 movies by sitelinks

|     title     | articles | sitelinks | labels |
|---------------|----------|-----------|--------|
| Avatar        |       92 |       103 |     99 |
| The Godfather |       79 |       103 |     82 |
| Forrest Gump  |       71 |       101 |     84 |
| Titanic       |       86 |       100 |    101 |
| The Matrix    |       67 |        94 |     77 |

Top 5 movies by labels

|            title             | articles | sitelinks | labels |
|------------------------------|----------|-----------|--------|
| The 25th Reich               |        2 |         2 |    227 |
| Time Is But Brief            |        0 |         0 |    224 |
| Michael Moore in TrumpLand   |        6 |         6 |    222 |
| Magnus - The Mozart of Chess |        1 |         1 |    221 |
| Lee Chong Wei                |        1 |         1 |    196 |


来源:https://stackoverflow.com/questions/48478064/get-all-wikidata-items-that-have-more-than-10-languages

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!