Equivalent of which in scraping?

后端 未结 2 1903
清歌不尽
清歌不尽 2020-12-12 02:36

I\'m trying to run some scraping where the action I take on a node is conditional on the contents of the node.

This should be a minimal example:

XML          


        
相关标签:
2条回答
  • 2020-12-12 03:19

    Based on R Conditional evaluation when using the pipe operator %>%, you can do something like

    page %>% 
       html_nodes(xpath='//td[@class="id-tag"]') %>% 
       {ifelse(is.na(html_node(.,xpath="span")), 
               html_text(.),
               {html_node(.,xpath="span") %>% html_attr("title")}
       )}
    

    I think it is possibly simple to discard the pipe and save some of the objects created along the way

    nodes <- html_nodes(page, xpath='//td[@class="id-tag"]')
    text <- html_text(nodes)
    title <- html_attr(html_node(nodes,xpath='span'),"title")
    value <- ifelse(is.na(html_node(nodes, xpath="span")), text ,title)
    

    An xpath only approach might be

    page %>% 
     html_nodes(xpath='//td[@class="id-tag"]/span/@title|//td[@class="id-tag"][not(.//span)]') %>%
     html_text()
    
    0 讨论(0)
  • 2020-12-12 03:19

    An alternate approach:

    library(tidyverse)
    library(rvest)
    
    XML <- '
    <td class="id-tag">
        <span title="Really Long Text">Really L...</span>
    </td>
    <td class="id-tag">Short</td>
    '
    
    pg <- read_html(XML)
    
    html_nodes(pg, "td[class='id-tag']") %>%
      map_chr(function(x) {
        if (xml_find_first(x, "boolean(.//span)")) {
          x <- html_nodes(x, xpath=".//span/@title")
        }
        html_text(x)
      })
    
    ## [1] "Really Long Text" "Short"
    
    0 讨论(0)
提交回复
热议问题