scraping xml/javascript table with R [closed]

孤街醉人 提交于 2019-12-04 16:07:32

You can use Selenium and RSelenium to get the relevant data:

library(RSelenium)
appURL <- "http://www.oddsportal.com//hockey/usa/nhl/carolina-hurricanes-ottawa-senators-80YZhBGC"
RSelenium::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(appURL)
tblSource <- remDr$executeScript("return tbls[0].outerHTML;")[[1]]
readHTMLTable(tblSource)
> readHTMLTable(tblSource)
$`NULL`
Bookmakers    1    X    2 Payout 
1    bet-at-home  2.25 3.80 2.60  91.6% 
2        Â bet365Â Â 2.29 3.79 2.64  92.7% 
3        Betsson  2.35 3.75 2.65  93.5% 
4           bwin  2.30 3.75 2.70  93.3% 
5    MarathonBet  2.35 3.80 2.78  95.4% 
6       Titanbet  2.30 3.95 2.50  91.9% 
7        TonyBet  2.35 3.70 2.70  93.8% 
8         Unibet  2.35 3.85 2.60  93.5% 
9   William Hill  2.30 3.90 2.50  91.6% 
10        Winner  2.30 3.95 2.50  91.9% 
11        youwin  2.40 3.75 2.55  93.0% 

The "bookies" data comes from a request for a javascript callback resource:

GET /x/bookies-140619144601-1403252087.js HTTP/1.1
Host: rb.oddsportal.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:30.0) Gecko/20100101 Firefox/30.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.oddsportal.com//hockey/usa/nhl/carolina-hurricanes-ottawa-senators-80YZhBGC/
Connection: keep-alive

it returns a callback resource that has the bookie info, but no odds. There are other callback AJAX calls for the data, but you'll have to dig.

Burp Proxy is a great way to see the URI calls, but the DOM inspection (as @Spacedman suggested) should always be your first line of investigation.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!