I\'m trying (fairly unsuccessfully) to scrape some data from a website (www.majidata.co.ke) using R. I\'ve managed to scrape the HTML and parse it but now a little unsure how to
Using xpath expressions with HTML is almost always a better choice than regex. Given this data you can extract what you're after with
options<-getNodeSet(xmlRoot(majidata_html), "//select[@id='town']/option")
ids <- sapply(options, xmlGetAttr, "value")
names <- sapply(options, xmlValue)
data.frame(ID=ids, Name=names)
which returns
ID Name
1 0 [SELECT TOWN]
2 611 AHERO
3 635 AKALA
4 625 AWASI
5 628 AWENDO
6 749 BAHATI
...