Hi I am trying to extract the table from the premierleague website.
The package I am using is rvest package and the code I am using in th
Since the data is loaded with JavaScript, grabbing the HTML with rvest will not get you what you want, but if you use PhantomJS as a headless browser within RSelenium, it's not all that complicated (by RSelenium standards):
library(RSelenium)
library(rvest)
# initialize browser and driver with RSelenium
ptm <- phantom()
rd <- remoteDriver(browserName = 'phantomjs')
rd$open()
# grab source for page
rd$navigate('https://fantasy.premierleague.com/a/entry/767830/history')
html <- rd$getPageSource()[[1]]
# clean up
rd$close()
ptm$stop()
# parse with rvest
df <- html %>% read_html() %>%
html_node('#ismr-event-history table.ism-table') %>%
html_table() %>%
setNames(gsub('\\S+\\s+(\\S+)', '\\1', names(.))) %>% # clean column names
setNames(gsub('\\s', '_', names(.)))
str(df)
## 'data.frame': 20 obs. of 10 variables:
## $ Gameweek : chr "GW1" "GW2" "GW3" "GW4" ...
## $ Gameweek_Points : int 34 47 53 51 66 66 65 63 48 90 ...
## $ Points_Bench : int 1 6 9 7 14 2 9 3 8 2 ...
## $ Gameweek_Rank : chr "2,406,373" "2,659,789" "541,258" "905,524" ...
## $ Transfers_Made : int 0 0 2 0 3 2 2 0 2 0 ...
## $ Transfers_Cost : int 0 0 0 0 4 4 4 0 0 0 ...
## $ Overall_Points : chr "34" "81" "134" "185" ...
## $ Overall_Rank : chr "2,406,373" "2,448,674" "1,914,025" "1,461,665" ...
## $ Value : chr "£100.0" "£100.0" "£99.9" "£100.0" ...
## $ Change_Previous_Gameweek: logi NA NA NA NA NA NA ...
As always, more cleaning is necessary, but overall, it's in pretty good shape without too much work. (If you're using the tidyverse, df %>% mutate_if(is.character, parse_number) will do pretty well.) The arrows are images which is why the last column is all NA, but you can calculate those anyway.