Using rvest package when HTML table has two headers

后端 未结 2 1377
礼貌的吻别
礼貌的吻别 2021-01-07 06:17

I am using the following code to scrape an HTML table on AFL player data:

library(rvest)

website <-read_html(\"https://afltables.com/afl/stats/teams/adel         


        
2条回答
  •  长发绾君心
    2021-01-07 07:06

    Firstly, and unrelated to your question: Don't use table as a name for your objects, because this name is already reserved for other functionalities in R. It is considered bad practice and I've been told that it will come back and nip you in the butt somewhere down the line.

    Moving on to the question: You are struggling with the type of data that html_table() gives you. You are returned a list, which contains a regular data.frame. The list you outputted, has NULL for the number of columns and rows, because that list only has one element: the data.frame. By selecting that first (and only) element of your list, you will get to the dataframe you're actually interesting in. This dataframe has 27 columns and 34 rows

    website <-read_html("https://afltables.com/afl/stats/teams/adelaide/2017_gbg.html")
    scraped <- website %>%
                    html_nodes("table") %>%
                    .[(1)] %>%
                    html_table() %>%
                    `[[`(1)   # Select the first element of the list, like scraped[[1]]
    ncol(scraped) 
    # 27
    nrow(scraped)
    # 34
    

提交回复
热议问题