问题
As refer to the post Accessing the Selenium API in R in this website, I can create a webdriver. However I unable to get the element details as same as Python can. May I know how to do?
I would like to scrape the soccer matches table of every single round...
# using R
library(RCurl)
library(RJSONIO)
library(XML)
# running selenium
system("java -jar selenium-server-standalone-2.35.0.jar")
baseURL<-"http://localhost:4444/wd/hub/"
server<-list(desiredCapabilities=list(browserName='firefox',javascriptEnabled=TRUE))
getURL(paste0(baseURL,"session"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(server))
serverDetails<-fromJSON(rawToChar(getURLContent('http://localhost:4444/wd/hub/sessions',binary=TRUE)))
serverId<-serverDetails$value[[1]]$id
# navigate to 7m.cn
URL = "http://data2.7m.cn/history_Matches_Data/2009-2010/92/en/index.shtml"
getURL(paste0(baseURL,"session/",serverId,"/url"),
customrequest="POST",
httpheader=c('Content-Type'='application/json;charset=UTF-8'),
postfields=toJSON(list(url=URL)))
Below are codes in Python to get the html element details of 7m.cn. Besides, any better idea to suggest? Thanks.
# using Python
import codecs
import lxml.html as lh
from selenium import webdriver
URL = 'http://data2.7m.cn/history_Matches_Data/2009-2010/92/en/index.shtml'
browser = webdriver.Firefox()
browser.get(URL)
content = browser.page_source
browser.quit()
回答1:
You can use the package relenium (Selenium for R). Disclaimer: I'm one of the developers.
require(relenium)
firefox <- firefoxClass$new()
firefox$get('http://data2.7m.cn/history_Matches_Data/2009-2010/92/en/index.shtml')
content <- firefox$getPageSource()
firefox$close()
来源:https://stackoverflow.com/questions/19321739/using-r-to-connect-selenium-server-standalone