问题
I'm trying to collect time series data over the last five years on Wikipedia page view statistics for a particular webpage ("Bitcoin"). I found this site to be useful: http://stats.grok.se for getting this data. Two issues:
The website triggers an "internal server error" error whenever 2016 is selected as a year for which to obtain data.
Is there an existing tool that can put this output in more usable form, such as a .csv?
回答1:
I don't know about stats.grok.se as it doesn't appear to live on a wikimedia production or labs server. But there's an API provided for page view statistics starting July 2015:
https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metrics_pageviews_per_article_project_access_agent_article_granularity_start_end
E.g., daily page views to https://en.wikipedia.org/wiki/Bitcoin over the past year: https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia.org/all-access/all-agents/Bitcoin/daily/20151105/20161105
all-access = desktop+mobile-web+mobile-app
all-agents = user+spider+bot
Historical data can be downloaded from https://dumps.wikimedia.org/other/pagecounts-raw/
回答2:
I found archive of page view statistics from 2007 to 2016 here: https://dumps.wikimedia.org/other/pagecounts-raw/
At the bottom of the page they list several other sources covering various time periods.
来源:https://stackoverflow.com/questions/40445113/getting-wikipedia-page-view-statistics