Setting “an informative User-Agent string” in getURL

不羁岁月 提交于 2019-12-07 00:02:31

问题


I am trying to access a Wikipedia page so to get a list of pages, and get the following error:

library(RCurl)
u <- "http://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&prefix=tal&namespace=4"
getURL(u)
[1] "Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.\n"

I hope to get to that page through the Wikipedia api, but I am not sure it would work.

And the thing is that other pages are read without problem, for example:

u <- "http://en.wikipedia.org/wiki/Wikipedia:Talk"
getURL(u)

Any suggestions?

Side note: In general I would rather to not scrape wiki pages and go through the api, but I fear that this specific pages are not yet available through the api...


回答1:


According to the documentation of RCurl, you can specify additional header by adding a httpheader parameter:

getURL(u, httpheader = c('User-Agent' = "Informative string with your contact info"))


来源:https://stackoverflow.com/questions/9056705/setting-an-informative-user-agent-string-in-geturl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!