Scrape values from HTML select/option tags in R

前端 未结 2 449
长情又很酷
长情又很酷 2021-01-21 18:59

I\'m trying (fairly unsuccessfully) to scrape some data from a website (www.majidata.co.ke) using R. I\'ve managed to scrape the HTML and parse it but now a little unsure how to

2条回答
  •  感动是毒
    2021-01-21 19:30

    Using xpath expressions with HTML is almost always a better choice than regex. Given this data you can extract what you're after with

    options<-getNodeSet(xmlRoot(majidata_html), "//select[@id='town']/option")
    
    ids <- sapply(options, xmlGetAttr, "value")
    names <- sapply(options, xmlValue)
    
    data.frame(ID=ids, Name=names)
    

    which returns

       ID          Name
    1   0 [SELECT TOWN]
    2 611         AHERO
    3 635         AKALA
    4 625         AWASI
    5 628        AWENDO
    6 749        BAHATI
    ...
    

提交回复
热议问题