How do I use cookies with RCurl?

断了今生、忘了曾经 提交于 2019-11-26 12:12:00

问题


I am trying to write an R package that accesses some data via a REST API. The API, however, doesn\'t use http authentication, but rather relies on cookies to keep credentials with the session.

Essentially, I\'d like to replace the following two lines from a bash script with two R functions: One to perform the login, and store the session cookie, and the second to GET the data.

curl -X POST -c cookies.txt -d\"username=xxx&password=yyy\" http://api.my.url/login
curl         -b cookies.txt                               http://api.my.url/data

I\'m clearly not understanding how RCurl works with curl options. My script as it stands has:

library(RCurl)
curl <- getCurlHandle()
curlSetOpt(cookiejar=\'cookies.txt\', curl=curl)
postForm(\"http://api.my.url/login\", username=\'xxx\', password=\'yyy\', curl=curl)
getURL(\'http://api.my.url/data\", curl=curl)

The final getURL() fails with a \"Not logged in.\" message from the server, and after the postForm() no cookies.txt file exists.


回答1:


In general you don't need to create a cookie file, unless you want to study the cookies.

Given this, in real word, web servers use agent data, redirecting and hidden post data, but this should help:

library(RCurl)

#Set your browsing links 
loginurl = "http://api.my.url/login"
dataurl  = "http://api.my.url/data"

#Set user account data and agent
pars=list(
     username="xxx"
     password="yyy"
)
agent="Mozilla/5.0" #or whatever 

#Set RCurl pars
curl = getCurlHandle()
curlSetOpt(cookiejar="cookies.txt",  useragent = agent, followlocation = TRUE, curl=curl)
#Also if you do not need to read the cookies. 
#curlSetOpt(  cookiejar="", useragent = agent, followlocation = TRUE, curl=curl)

#Post login form
html=postForm(loginurl, .params = pars, curl=curl)

#Go wherever you want
html=getURL(dataurl, curl=curl)

#Start parsing your page
matchref=gregexpr("... my regexp ...", html)

#... .... ...

#Clean up. This will also print the cookie file
rm(curl)
gc()

Important

There can often be hidden post data, beyond username and password. To capture it you may want, e.g. in Chrome, to use Developer tools (Ctrl Shift I) -> Network Tab, in order to show the post field names and values.




回答2:


My bad. Neal Richter pointed out to me http://www.omegahat.org/RCurl/RCurlJSS.pdf - which better explains the difference between cookiefile and cookiejar. The sample script in the question actually does work. But it only writes the file to disk when it is no longer being used.



来源:https://stackoverflow.com/questions/2388974/how-do-i-use-cookies-with-rcurl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!