What's my user agent when I parse website with rvest package in R?

十年热恋 提交于 2019-11-30 13:39:47

问题


Since it is easy in R, I am using rvest package to parse HTML to extract informations from website.

I am wondering what's my User-Agent (if there is any) during the request, since User-Agent is assigned to the internet browser or is there a way to set it somehow?

My code that open session and extract informations from HTML is below:

library(rvest)
se <- html_session( "http://www.wp.pl" ) %>% 
html_nodes("[data-st-area=Glonews-mozaika] li:nth-child(7) a") %>%
html_attr( name = "href" )

回答1:


I used https://httpbin.org/user-agent to find out:

library(rvest)
se <- html_session( "https://httpbin.org/user-agent" )
se$response$request$options$useragent

Answer:

[1] "libcurl/7.37.1 r-curl/0.9.1 httr/1.0.0"

See this bug report for a way to override it.




回答2:


I found this somewhere in a tutorial, it looks like an easier faster way to do it:

uastring <- "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"
session <- html_session("https://www.linkedin.com/job/", user_agent(uastring))


来源:https://stackoverflow.com/questions/31406503/whats-my-user-agent-when-i-parse-website-with-rvest-package-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!