Extract Links from Webpage using R

前端 未结 3 536
庸人自扰
庸人自扰 2020-12-23 02:07

The two posts below are great examples of different approaches of extracting data from websites and parsing it into R.

Scraping html tables into R data frames usin

3条回答
  •  南方客
    南方客 (楼主)
    2020-12-23 03:10

    Even easier with rvest:

    library(xml2)
    library(rvest)
    
    URL <- "http://stackoverflow.com/questions/3746256/extract-links-from-webpage-using-r"
    
    pg <- read_html(URL)
    
    head(html_attr(html_nodes(pg, "a"), "href"))
    
    ## [1] "//stackoverflow.com"                                                                                                                                          
    ## [2] "http://chat.stackoverflow.com"                                                                                                                                
    ## [3] "//stackoverflow.com"                                                                                                                                          
    ## [4] "http://meta.stackoverflow.com"                                                                                                                                
    ## [5] "//careers.stackoverflow.com?utm_source=stackoverflow.com&utm_medium=site-ui&utm_campaign=multicollider"                                                       
    ## [6] "https://stackoverflow.com/users/signup?ssrc=site_switcher&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2f3746256%2fextract-links-from-webpage-using-r"
    

提交回复
热议问题