How to use Tor socks5 in R getURL

前端 未结 4 1298
终归单人心
终归单人心 2021-02-06 09:45

I want to use Tor in getURL function in R. Tor is working (checked in firefox), socks5 at port 9050. But when I set this in R, I get the f

4条回答
  •  别跟我提以往
    2021-02-06 10:10

    RCurl will default to a HTTP proxy, but Tor provides a SOCKS proxy. Tor is clever enough to understand that the proxy client (RCurl) is trying to use a HTTP proxy, hence the error message in HTML returned by Tor.

    In order to get RCurl, and curl, to use a SOCKS proxy, you can use a protocol prefix, and there are two protocol prefixes for SOCKS5: "socks5" and "socks5h" (see the Curl manual). The latter will let the SOCKS server handle DNS-queries, which is the preferred method when using Tor (in fact, Tor will warn you if you let the proxy client resolve the hostname).

    Here is a pure R solution which will use Tor for dns-queries.

    library(RCurl)
    options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050"))
    my.handle <- getCurlHandle()
    html <- getURL(url='https://www.torproject.org', curl=my.handle)
    

    If you want to specify additional parameters, see below on where to put them:

    library(RCurl)
    options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050",
                                useragent = "Mozilla",
                                followlocation = TRUE,
                                referer = "",
                                cookiejar = "my.cookies.txt"
                                )
            )
    my.handle <- getCurlHandle()
    html <- getURL(url='https://www.torproject.org', curl=my.handle)
    

提交回复
热议问题