scraping xml/javascript table with R [closed]

I want to scrape a table like this http://www.oddsportal.com//hockey/usa/nhl/carolina-hurricanes-ottawa-senators-80YZhBGC/ I'd want to scrape the bookmakers and the odds. The problem is I don't know what kind of a table that is nor how to scrape it.

These threads might be able to help me (Scraping javascript with R or What type of HTML table is this and what type of webscraping techniques can you use?) but I'd appreciate if someone could point me in the right direction or better yet give instructions here.

So what kind of a table is that odds table, is it possible to scrape it with R and if so, how?

Edit: I should have been more clear. I have scraped data with R for some time now and probably dont need help with basics. After further inspection that table is indeed Javascript and that is the problem and what I need help with

You can use Selenium and RSelenium to get the relevant data:

library(RSelenium)
appURL <- "http://www.oddsportal.com//hockey/usa/nhl/carolina-hurricanes-ottawa-senators-80YZhBGC"
RSelenium::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(appURL)
tblSource <- remDr$executeScript("return tbls[0].outerHTML;")[[1]]
readHTMLTable(tblSource)
> readHTMLTable(tblSource)
$`NULL`
Bookmakers    1    X    2 Payout 
1   Â bet-at-homeÂ Â 2.25 3.80 2.60  91.6% 
2        Â bet365Â Â 2.29 3.79 2.64  92.7% 
3       Â BetssonÂ Â 2.35 3.75 2.65  93.5% 
4          Â bwinÂ Â 2.30 3.75 2.70  93.3% 
5   Â MarathonBetÂ Â 2.35 3.80 2.78  95.4% 
6      Â TitanbetÂ Â 2.30 3.95 2.50  91.9% 
7       Â TonyBetÂ Â 2.35 3.70 2.70  93.8% 
8        Â UnibetÂ Â 2.35 3.85 2.60  93.5% 
9  Â William HillÂ Â 2.30 3.90 2.50  91.6% 
10       Â WinnerÂ Â 2.30 3.95 2.50  91.9% 
11       Â youwinÂ Â 2.40 3.75 2.55  93.0%

The "bookies" data comes from a request for a javascript callback resource:

GET /x/bookies-140619144601-1403252087.js HTTP/1.1
Host: rb.oddsportal.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:30.0) Gecko/20100101 Firefox/30.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.oddsportal.com//hockey/usa/nhl/carolina-hurricanes-ottawa-senators-80YZhBGC/
Connection: keep-alive

it returns a callback resource that has the bookie info, but no odds. There are other callback AJAX calls for the data, but you'll have to dig.

Burp Proxy is a great way to see the URI calls, but the DOM inspection (as @Spacedman suggested) should always be your first line of investigation.

来源：https://stackoverflow.com/questions/24327980/scraping-xml-javascript-table-with-r

标签

javascript

scrape