Scrape values from HTML select/option tags in R

前端未结

关注

 2  449

长情又很酷 2021-01-21 18:59

I\'m trying (fairly unsuccessfully) to scrape some data from a website (www.majidata.co.ke) using R. I\'ve managed to scrape the HTML and parse it but now a little unsure how to

2条回答

感动是毒 (楼主)

2021-01-21 19:30

Using xpath expressions with HTML is almost always a better choice than regex. Given this data you can extract what you're after with

options<-getNodeSet(xmlRoot(majidata_html), "//select[@id='town']/option")

ids <- sapply(options, xmlGetAttr, "value")
names <- sapply(options, xmlValue)

data.frame(ID=ids, Name=names)

which returns

   ID          Name
1   0 [SELECT TOWN]
2 611         AHERO
3 635         AKALA
4 625         AWASI
5 628        AWENDO
6 749        BAHATI
...

0 讨论(0)

查看其它2个回答