Scraping a table from a section in Wikipedia

爷,独闯天下 提交于 2019-12-04 17:44:07

You can scrape everything at once by looping the URLs with lapply and pulling the tables with a carefully chosen XPath selector:

library(rvest)

lapply(paste0('https://en.wikipedia.org/wiki/', 1920:2015, '_NFL_season'), 
       function(url){ 
           url %>% read_html() %>% 
               html_nodes(xpath = '//span[contains(@id, "tandings")]/following::*[@title="Winning percentage" or text()="PCT"]/ancestor::table') %>% 
               html_table(fill = TRUE)
       })

The XPath selector looks for

  • //span[contains(@id, "tandings")]
    • all spans with an id with tandings in it (e.g "Standings", "Final standings")
  • /following::*[@title="Winning percentage" or text()="PCT"]
    • with a node after it in the HTML with
      • either a title attribute of "Winning Percentage"
      • or containing "PCT"
  • /ancestor::table
    • and selects the table node that is up the tree from that node.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!