rselenium | 易学教程

Scraping password protected forum in r

阅读更多关于 Scraping password protected forum in r

I have a problem with logging in in my script. Despite all other good answers that I found on stackoverflow, none of the solutions worked for me. I am scraping a web forum for my PhD research, its URL is http://forum.axishistory.com . The webpage I want to scrape is the memberlist - a page that lists the links to all member profiles. One can only access the memberlist if logged in. If you try to access the memberlist without logging in, it shows you the log in form. The URL of the memberlist is this: http://forum.axishistory.com/memberlist.php . I tried the httr-package: library(httr) members

How to open Google Chrome with RSelenium?

阅读更多关于 How to open Google Chrome with RSelenium?

问题 I am using RSelenium and I want to open and navigate Google Chrome. However, I always get an error when I want to open the browser from R. The following code is used: library("RSelenium") startServer() mybrowser <- remoteDriver(browserName = "chrome") mybrowser$open() [1] "Connecting to remote server" Error: Summary: UnknownError Detail: An unknown server-side error occurred while processing the command. class: java.lang.IllegalStateException The same code works for Firefox. What can I do

RSelenium: server signals port is already in use

阅读更多关于 RSelenium: server signals port is already in use

问题 I'm using the following code in RSelenium to open a browser. After I close the browser, or even close the handler by running remDr$close(), the port is still in use. I have to go to the terminal and manually kill the process so that the same port becomes available. Is there any automated way such that RSelenium makes the port free after it finishes scraping? So here is my code: library(RSelenium) rD <- rsDriver(verbose = FALSE,port=4444L) remDr <- rD$client remDr$close() Thanks 回答1: The

Downloading a pdf using RSelenium

阅读更多关于 Downloading a pdf using RSelenium

问题 What I am trying to do with RSelenium package is, Step:1 Access a website - My own electric utility provider Step:2 Access my account by explicitly providing my username and password (That's the reason I am unable to share the code) Step:3 I click 'VIEW MY BILL' . The bill is displayed in pdf format. Is there a way to download that file and save to specific folder? When I used download.file() command, it does not save the document, rather I get a 3KB pdf file that I am not able to open. Adobe

How to read an html table using Rselenium?

阅读更多关于 How to read an html table using Rselenium?

问题 I'm using Rselenium to navigate to a webpage. The following code is doing so. I haven't provided the url because I'm using the url in a company which needs vpn to connect: RSelenium::startServer() require(RSelenium) remDr <- remoteDriver() remDr$navigate("some url") After I navigate to the webpage, inside the html source I have the following table: <font size="2"> <table border="1"> <tbody> <tr> <td> item1 </td> <td> 0 </td> <td> 0.05 </td> <td> 2.43 </td> <td align="center"> Pct </td> <td

Scraping password protected forum in r

阅读更多关于 Scraping password protected forum in r

问题 I have a problem with logging in in my script. Despite all other good answers that I found on stackoverflow, none of the solutions worked for me. I am scraping a web forum for my PhD research, its URL is http://forum.axishistory.com. The webpage I want to scrape is the memberlist - a page that lists the links to all member profiles. One can only access the memberlist if logged in. If you try to access the memberlist without logging in, it shows you the log in form. The URL of the memberlist

download file with Rselenium & docker toolbox

阅读更多关于 download file with Rselenium & docker toolbox

问题 I m trying to download files by Rselenium but it looks impossible.I don't arrive to download even with an easy example: 1) i have installed docker toolbox (https://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-docker.html) 2) i ran the firefox standalone image : 3.1.0 and now i m testing the older 2.52.0 3) i have installed the rselenium package on My R X64 3.3.2 and i read all the questions & answers on stackoverflow 4) i have tried the following code, by the way, when i

Using R to “click” a download file button on a webpage

阅读更多关于 Using R to “click” a download file button on a webpage

I am attempting to use this webpage http://volcano.si.edu/search_eruption.cfm to scrape data. There are two drop-down boxes that ask for filters of the data. I do not need filtered data, so I leave those blank and continue on to the next page by clicking " Search Eruptions ". What I have noticed, though, is that the resulting table only includes a small amount of columns (only 5) compared to the total amount of columns (total of 24) it should have. However, all 24 columns will be there if you click the " Download Results to Excel " button and open the downloaded file. This is what I need. So,

RSelenium UnknownError - java.lang.IllegalStateException with Google Chrome

阅读更多关于 RSelenium UnknownError - java.lang.IllegalStateException with Google Chrome

I am running the following script based on the RSelenium Basics CRAN page : library(RSelenium) startServer(args = c("-port 4455"), log = FALSE, invisible = FALSE) remDr <- remoteDriver(browserName = "chrome") remDr$open() This produces the following error: Exception in thread "main" java.net.BindException: Selenium is already running on port 4444. Or some other service is. at org.openqa.selenium.server.SeleniumServer.start(SeleniumServer.java:492) at org.openqa.selenium.server.SeleniumServer.boot(SeleniumServer.java:305) at org.openqa.selenium.server.SeleniumServer.main(SeleniumServer.java:245

RSelenium on docker: where are files downloaded?

阅读更多关于 RSelenium on docker: where are files downloaded?

问题 I am using Selenium using a docker image: require(RSelenium) if (length(system("docker ps -l", intern = TRUE))<2) try({system("docker run -d -p 4445:4444 selenium/standalone-firefox:2.53.0")}) It works, I can connect to any url and navigate. However when I click a button to download a file, it sometimes saves it (partially, saved as xxxxxxx.csv.part ) to /tmp/mozilla_mozillaUser0 , and sometimes to ... nowhere, or maybe another location I cannot find... Is there a reason for that? Also I