Scraping with rvest - complete with NAs when tag is not present

后端 未结 4 2110
清酒与你
清酒与你 2020-11-30 12:20

I want to parse this HTML: and get this elements from it:

a) p tag, with class: \"normal_encontrado\".
b) div with c

4条回答
  •  感动是毒
    2020-11-30 12:40

    If the tag is not found, rvest returns a character(0). So assuming you will find at most one current and one regular price in each div.product_price, you can use this:

    pacman::p_load("rvest", "dplyr")
    
    get_prices <- function(node){
      r.precio.antes <- html_nodes(node, 'p.normal_encontrado') %>% html_text
      r.precio.actual <- html_nodes(node, 'div.price') %>% html_text
    
      data.frame(
        precio.antes = ifelse(length(r.precio.antes)==0, NA, r.precio.antes),
        precio.actual = ifelse(length(r.precio.actual)==0, NA, r.precio.actual), 
        stringsAsFactors=F
      )
    
    }
    
    doc <- read_html('test.html') %>% html_nodes("div.product_price")
    lapply(doc, get_prices) %>%
      rbind_all
    

    Edited: I misunderstood the input data, so changed the script to work with just a single html page.

提交回复
热议问题