rselenium

Login to a website (billboard.com) for scraping purposes using R, when the login is done through a pop-up window

限于喜欢 提交于 2020-04-11 06:45:20
问题 I want to scrap some "pro" billboard data, which access requires a premium billboard account. I already have one, but obvisouly I need to login to the billboard website through R in order to be able to scrap this data. I have no issues with such a thing with regular login pages (for instance, stackoverflow): ##### Stackoverflow login ##### # Packages installation and loading --------------------------------------- if (!require("pacman")) install.packages("pacman") pacman::p_load(rvest,dplyr

R - Waiting for page to load in RSelenium with PhantomJS

无人久伴 提交于 2020-01-31 12:41:49
问题 I put together a crude scraper that scrapes prices/airlines from Expedia: # Start the Server rD <- rsDriver(browser = "phantomjs", verbose = FALSE) # Assign the client remDr <- rD$client # Establish a wait for an element remDr$setImplicitWaitTimeout(1000) # Navigate to Expedia.com appurl <- "https://www.expedia.com/Flights-Search?flight-type=on&starDate=04/30/2017&mode=search&trip=oneway&leg1=from:Denver,+Colorado,to:Oslo,+Norway,departure:04/30/2017TANYT&passengers=children:0,adults:1" remDr

Triggering doPostBack javascript with RSelenium to scrap multi-page table

社会主义新天地 提交于 2020-01-25 06:39:07
问题 I am struggling to 'web-scrap' data from a table which spans over several pages. The pages are linked via javascript. The data I am interested in is based on the website's search function: url <- "http://aims.niassembly.gov.uk/plenary/searchresults.aspx?tb=0&tbv=0&tbt=All%20Members&pt=7&ptv=7&ptt=Petition%20of%20Concern&mc=0&mcv=0&mct=All%20Categories&mt=0&mtv=0&mtt=All%20Types&sp=1&spv=0&spt=Tabled%20Between&ss=jc7icOHu4kg=&tm=2&per=1&fd=01/01/2011&td=17/04/2018&tit=0&txt=0&pm=0&it=0&pid=1

Catch R Selenium error message and write it to log

拥有回忆 提交于 2020-01-16 10:03:41
问题 I have a few scrapes via RSelenium scheduled. Sometimes the scraping failes and i would like to know the reason. I note that the error Messages (in red) are quite informative, but i dont know how to log them. Lets say i provided a "non well formed URL".: tryCatch( expr = remDr$navigate("i.am.not.an.url"), error = function(error){ print(error) # write.table(error, file = ...) } ) This is what i get, but it doesnt give much specification on what triggered the error <simpleError: Summary:

Catch R Selenium error message and write it to log

↘锁芯ラ 提交于 2020-01-16 10:02:14
问题 I have a few scrapes via RSelenium scheduled. Sometimes the scraping failes and i would like to know the reason. I note that the error Messages (in red) are quite informative, but i dont know how to log them. Lets say i provided a "non well formed URL".: tryCatch( expr = remDr$navigate("i.am.not.an.url"), error = function(error){ print(error) # write.table(error, file = ...) } ) This is what i get, but it doesnt give much specification on what triggered the error <simpleError: Summary:

How to specify firefox browser version for `RSelenium::rs_driver()`?

白昼怎懂夜的黑 提交于 2020-01-16 09:00:08
问题 RSelenium::rsDriver() has the argument version specifying "what version of Selenium Server to run." The function RSelenium::rsDriver() calls RSelenium::remoteDriver() which also has the argument version but in this case it specifies "The browser version" (RSelenium documentation). RSelenium::rsDriver() passes any ... args into RSelenium::remoteDriver() but, given name duplicate (both rs_driver() and remote_driver() contain arg for version ), there does not seem an argument to supply to change

RSelenium rsDriver gives error can't kill an exited process

拈花ヽ惹草 提交于 2020-01-15 11:22:00
问题 I am struggling to make RSelenium work on a unix server. It has Mozilla Firefox 60.6.1, and running the two commands: binman::list_versions("geckodriver") $linux64 [1] "0.22.0" "0.23.0" "0.24.0" binman::list_versions("seleniumserver") $generic [1] "3.141.59" "4.0.0-alpha-1" "4.0.0-alpha-2" it seems that the geckodriver is available (is it ?). But when I try to launch a driver : > library(RSelenium) > rD <- rsDriver(browser = "firefox", + extraCapabilities = list( + "moz:firefoxOptions" = list

RSelenium rsDriver gives error can't kill an exited process

*爱你&永不变心* 提交于 2020-01-15 11:21:52
问题 I am struggling to make RSelenium work on a unix server. It has Mozilla Firefox 60.6.1, and running the two commands: binman::list_versions("geckodriver") $linux64 [1] "0.22.0" "0.23.0" "0.24.0" binman::list_versions("seleniumserver") $generic [1] "3.141.59" "4.0.0-alpha-1" "4.0.0-alpha-2" it seems that the geckodriver is available (is it ?). But when I try to launch a driver : > library(RSelenium) > rD <- rsDriver(browser = "firefox", + extraCapabilities = list( + "moz:firefoxOptions" = list

Can't set up profile with latest version of RSelenium for firefox

安稳与你 提交于 2020-01-07 04:36:34
问题 this code ran for last version I used in July: fprof <- getFirefoxProfile("c:/Users/cp/AppData/Roaming/Mozilla/Firefox/Profiles/j7a5lq0r.selenium", useBase = T) rS <- rsDriver(browser = "firefox", port = 4567L, extraCapabilities = fprof) but now the output is: [1] "Connecting to remote server" $`moz:profile [1] "C:\\Users\\cp\\AppData\\Local\\Temp\\rust_mozprofile.7n9y6eVCqnpt" so it creates some temporary profile not loading the set one. Anyone knows what's the issue? This profile is valid

Read values in dropdown menu element with RSelenium

…衆ロ難τιáo~ 提交于 2020-01-03 15:56:54
问题 I am using RSelenium to navigate to sites and interact with the elements. Question: using RSelenium, how can I read the list of options in a dropdown menu so that I can identify the latest month available and use that to set the dropdown to the correct value? On a certain site a dropdown menu is provided for the user to set the month of the year, thus defining the end point of a date range used in turn to display or download monthly data. As additional months of data become available through