How to use Wikipedia API to get the page view statistics of a particular page in wikipedia?

僤鯓⒐⒋嵵緔 提交于 2019-11-27 01:51:33

问题


The stats.grok.se tool provides the pageview statistics of a particular page in wikipedia. Is there a method to use the wikipedia api to get the same information? What does the page views counter property actually mean?


回答1:


The Pageview API was released a few days ago: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/{project}/{access}/{agent}/{article}/{granularity}/{start}/{end}

  • https://wikimedia.org/api/rest_v1/?doc#/
  • https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageview_API

For example https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/all-agents/Foo/daily/20151010/20151012 will give you

{
  "items": [
    {
      "project": "en.wikipedia",
      "article": "Foo",
      "granularity": "daily",
      "timestamp": "2015101000",
      "access": "all-access",
      "agent": "all-agents",
      "views": 79
    },
    {
      "project": "en.wikipedia",
      "article": "Foo",
      "granularity": "daily",
      "timestamp": "2015101100",
      "access": "all-access",
      "agent": "all-agents",
      "views": 81
    }
  ]
}



回答2:


No, there is not.

The counter property returned from prop=info would tell you how many times the page was viewed from the server. It is disabled on Wikipedia and other Wikimedia wikis because the aggressive squid/varnish caching means only a tiny fraction of page views would make it to the actual server in order to affect that counter, and even then the increased database write load for updating that counter would probably be prohibitive.

The stats.grok.se tool uses anonymized logs from the cache servers to calculate page views; the raw log files are available from http://dammit.lt/wikistats. If you need an API to access the data from stats.grok.se, you should contact the operator of stats.grok.se to request one be created.


Note this was written 4 years ago, and an API has since been created (see this answer). There's not yet a way to access that via api.php, though.




回答3:


get the daily JSON for the last 30 days like this

http://stats.grok.se/json/en/latest30/Britney_Spears




回答4:


You can look into the stats here. Have anyone experienced some API to get the Pageview Stats? Furthermore, I have also looked into the available Raw Data but could not find the solution to extract the Pageview Count.




回答5:


There doesn't seem to be any API; however, you can make HTTP requests to stats.grok.se and parse the HTML or JSON result to extract the page view counts.

I created a website http://wikipediaviews.org that does exactly that in order to facilitate easier comparison for multiple pages across multiple months and years. To speed things up, and minimize the number of requests to stats.grok.se, I keep all past query results stored locally.

The code I used is available at http://github.com/vipulnaik/wikipediaviews.

The file with the actual retrieval code is in https://github.com/vipulnaik/wikipediaviews/blob/master/backend/pageviewqueries.inc

function getpageviewsonline($page, $month, $language)
{
  $url = getpageviewsurl($page,$month,$language);
  $html = file_get_contents($url);
  preg_match('/(?<=\bhas been viewed)\s+\K[^\s]+/',$html,$numberofpageviews);
  return $numberofpageviews[0];
}

The code for getpageviewsurl is in https://github.com/vipulnaik/wikipediaviews/blob/master/backend/stringfunctions.inc:

function getpageviewsurl($page,$month,$language)
{
  $page = str_replace(" ","_",$page);
  $page = str_replace("'","%27",$page);
  return "http://stats.grok.se/" . $language . "/" . $month . "/" . $page;
}

PS: In case the link to wikipediaviews.org doesn't work, it's because I registered the domain quite recently. Try http://wikipediaviews.subwiki.org instead in the interim.




回答6:


em.. this question was asked 6 years ago. There's no such an API in official site in the past.

It changed.

A simple example:

https://en.wikipedia.org/w/api.php?action=query&format=json&prop=pageviews&titles=Buckingham+Palace%7CBank+of+England%7CBritish+Museum

See document:

prop=pageviews

Shows per-page pageview data (the number of daily pageviews for each of the last pvipdays days). The result format is page title (with underscores) => date (Ymd) => count.



来源:https://stackoverflow.com/questions/5323589/how-to-use-wikipedia-api-to-get-the-page-view-statistics-of-a-particular-page-in

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!