How to stop execution of RCurl::getURL() if it is taking too long?

喜你入骨 提交于 2019-12-03 16:21:31

I believe that the Web server is getting itself into a confused state by telling us that the URL is temporarily moved and then it points us to a new URL

http://photos.prnewswire.com/medias/switch.do?prefix=/appnb&page=/getStoryRemapDetails.do&prnid=20110713%252fN\ Y34814%252db&action=details

When we follow that, it redirects us again to .... the same URL!!!

So the timeout is not a problem. The response comes very quickly and so the timeout duration is not exceed. It is the fact that we go round and round in circles that causes the apparent hang.

The way I found this is by adding verbose = TRUE to the list of .opts Then we see all the communication between us and the server.

D.

timeout and connecttimeout are curl options, so they need to be passed in a list to the .opts paramter to getURL. Not sure which of the two that you need, but start with

getURL(u, followLocation = TRUE, .opts = list(timeout = 3))

EDIT:

I can reproduce the hang; changing buffered output doesn't fix it for me (tested under R2.13.0 and R2.13.1), and it happens with or without the timeout argument. If you try getURL on the page that is the target of the redirect, it appears blank.

u2 <- "http://photos.prnewswire.com/medias/switch.do?prefix=/appnb&page=/getStoryRemapDetails.do&prnid=20110713%252fNY34814%252db&action=details"
getURL(u2)

If you remove the page argument, it redirects you to a login page; maybe PR Newswire is doing something funny with asking for credentials.

u3 <- "http://photos.prnewswire.com/medias/switch.do?prefix=/appnb&prnid=20110713%252fNY34814%252db&action=details"
getURL(u3)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!