How to configure the curl package in R with default web proxy settings?

我与影子孤独终老i 提交于 2020-02-24 10:15:11

问题


I'm using R in a commercial environment where external connectivity all goes via a web proxy, so we need to specify the proxy server address and ensure we connect to it with Windows authentication.

I already have code that will configure the RCurl and httr packages to use those settings by default - i.e.

httr::set_config(config(
  proxy = "my.proxy.address", 
  proxyuserpwd = ":", 
  proxyauth = 4
   ))

or

opts <- list(
  proxy = "my.proxy.address",
  proxyuserpwd = ":", 
  proxyauth = 4
)
RCurl::options(RCurlOptions = opts)

However, in a couple of cases recently, I've found packages that depend on the curl package to make web requests - for instance xml2::read_xml - and I can't find any way to set the same proxy options so they're picked up by default and used by curl.

If I use curl directly myself, I can set the options on a new handle and the following code is sufficient to work successfully:

  h = new_handle(proxy = "my.proxy.address",
                 proxyuserpwd = ":")
  con = curl(url,handle = h)
  page = xml2::read_xml(con)

... but this isn't any help when the use of curl is buried within someone else's function!

Alternatively, I know I can set up an environment variable for the proxy address, like this:

Sys.setenv(https_proxy = "https://my.proxy.address")

... and libcurl picks it up. But if I do just this, then I end up with an HTTP 407 proxy authentication error. Is there a way to specify blank username / password (as the proxyuserpwd setting does), so we authenticate with Windows credentials? It also doesn't seem possible to specify the proxyauth option as an environment variable.

Can anyone offer a solution or any suggestions, please?


回答1:


I was having similar issues. Here are the steps that worked for me:

  1. Download my company's proxy auto-config file (PAC file). For IE: click the gear icon --> internet options --> Connections --> LAN Settings --> copy the http address into a new browser window to download the text file.
  2. Locate the line in the PAC file specifying the proxy (eg: "auth-proxy.xxxxxxx.com:9999")
  3. In a new R session, test these proxy settings by temporarily setting them with a command similar to the following, substituting your values from your PAC file:

    Sys.setenv(http_proxy = "auth-proxy.xxxxxxx.com:9999")
    Sys.setenv(https_proxy = "auth-proxy.xxxxxxx.com:9999")
    
  4. Rerun your code in the same session to see if these new settings solve the issue. This is the test I used.

    read_html(curl('http://google.com', handle = curl::new_handle("useragent" = "Mozilla/5.0")))
    

Setting the proxy using Sys.setenv will only persist to the end of your current session. To make a more permanent change you may consider adding this to your R_PROFILE as explained here.



来源:https://stackoverflow.com/questions/53011866/how-to-configure-the-curl-package-in-r-with-default-web-proxy-settings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!