I\'m trying to run some scraping where the action I take on a node is conditional on the contents of the node.
This should be a minimal example:
XML
Based on R Conditional evaluation when using the pipe operator %>%, you can do something like
page %>%
html_nodes(xpath='//td[@class="id-tag"]') %>%
{ifelse(is.na(html_node(.,xpath="span")),
html_text(.),
{html_node(.,xpath="span") %>% html_attr("title")}
)}
I think it is possibly simple to discard the pipe and save some of the objects created along the way
nodes <- html_nodes(page, xpath='//td[@class="id-tag"]')
text <- html_text(nodes)
title <- html_attr(html_node(nodes,xpath='span'),"title")
value <- ifelse(is.na(html_node(nodes, xpath="span")), text ,title)
An xpath only approach might be
page %>%
html_nodes(xpath='//td[@class="id-tag"]/span/@title|//td[@class="id-tag"][not(.//span)]') %>%
html_text()
An alternate approach:
library(tidyverse)
library(rvest)
XML <- '
<td class="id-tag">
<span title="Really Long Text">Really L...</span>
</td>
<td class="id-tag">Short</td>
'
pg <- read_html(XML)
html_nodes(pg, "td[class='id-tag']") %>%
map_chr(function(x) {
if (xml_find_first(x, "boolean(.//span)")) {
x <- html_nodes(x, xpath=".//span/@title")
}
html_text(x)
})
## [1] "Really Long Text" "Short"