Rcurl: url.exists returns false when url does exist

前端 未结 1 751
Happy的楠姐
Happy的楠姐 2020-12-19 04:51

Trying to download information from a specific web page, and although it opens fine in any browser, RCurl says it does not exists:

url.exists(\"http://www.t         


        
1条回答
  •  既然无缘
    2020-12-19 05:10

    That webserver appears to return a 403 Forbidden error when your HTTP request does not include a user-agent string. RCurl by default does not pass a user-agent. You can set one with the useragent= parameter.

    myurl<-"http://www.transfermarkt.es/liga-mx-apertura/startseite/wettbewerb/MEXA"
    url.exists(myurl, useragent="curl/7.39.0 Rcurl/1.95.4.5")
    # [1] TRUE
    htmlTreeParse(getURL(myurl, useragent="curl/7.39.0 Rcurl/1.95.4.5"))
    

    The httr package is a bit nicer than RCurl for making HTTP requests in my opinion (and it sets a user-agent string by default). Here's the corresponding code

    library(httr)
    GET(myurl)
    

    0 讨论(0)
提交回复
热议问题