Scraping a complex HTML table into a data.frame in R

后端 未结 2 945
盖世英雄少女心
盖世英雄少女心 2020-12-17 01:10

I am trying to load wikipedia\'s data on US Supreme Court Justices into R:

library(rvest)

html = html(\"http://en.wikipedia.org/wiki/List_of_Justices_of_the         


        
相关标签:
2条回答
  • 2020-12-17 01:25

    Maybe like this

    library(XML)
    library(rvest)
    html = html("http://en.wikipedia.org/wiki/List_of_Justices_of_the_Supreme_Court_of_the_United_States")
    judges = html_table(html_nodes(html, "table")[[2]])
    head(judges[,2])
    # [1] "Wilson, JamesJames Wilson"       "Jay, JohnJohn Jay†"              "Cushing, WilliamWilliam Cushing" "Blair, JohnJohn Blair, Jr."     
    # [5] "Rutledge, JohnJohn Rutledge"     "Iredell, JamesJames Iredel
    
    removeNodes(getNodeSet(html, "//table/tr/td[2]/span"))
    judges = html_table(html_nodes(html, "table")[[2]])
    head(judges[,2])
    # [1] "James Wilson"    "John Jay†"       "William Cushing" "John Blair, Jr." "John Rutledge"   "James Iredell" 
    
    0 讨论(0)
  • 2020-12-17 01:42

    You could use rvest

    library(rvest)
    
    html("http://en.wikipedia.org/wiki/List_of_Justices_of_the_Supreme_Court_of_the_United_States")%>%   
      html_nodes("span+ a") %>% 
      html_text()
    

    It's not perfect so you might want to refine the css selector but it gets you fairly close.

    0 讨论(0)
提交回复
热议问题