The two posts below are great examples of different approaches of extracting data from websites and parsing it into R.
Scraping html tables into R data frames usin
Even easier with rvest
:
library(xml2)
library(rvest)
URL <- "http://stackoverflow.com/questions/3746256/extract-links-from-webpage-using-r"
pg <- read_html(URL)
head(html_attr(html_nodes(pg, "a"), "href"))
## [1] "//stackoverflow.com"
## [2] "http://chat.stackoverflow.com"
## [3] "//stackoverflow.com"
## [4] "http://meta.stackoverflow.com"
## [5] "//careers.stackoverflow.com?utm_source=stackoverflow.com&utm_medium=site-ui&utm_campaign=multicollider"
## [6] "https://stackoverflow.com/users/signup?ssrc=site_switcher&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2f3746256%2fextract-links-from-webpage-using-r"