Inserting NA in blank values from web scraping

无人久伴 提交于 2021-02-10 20:11:25

问题


I am working on scraping some data into a data frame, and am getting some empty fields, where I would instead prefer to have NA. I have tried na.strings, but am either placing it in the wrong place or it just isn't working, and I tried to gsub anything that was whitespace from beginning of line to end, but that didn't work.

htmlpage <- read_html("http://www.gourmetsleuth.com/features/wine-cheese-pairing-guide")
sugPairings <- html_nodes(htmlpage, ".meta-wrapper")
suggestions <- html_text(sugPairings)
suggestions <- gsub("\\r\\n", '', suggestions)

How can I sub out the blank fields with NA, either once it is added to the data frame, or before adding it.


回答1:


rvest::html_text has an build in trimming option setting trim=TRUE. After you have done this you can use e.g. ifelse to test for an empty string (=="") or use nzchar.

I full you could do this:

html_nodes(htmlpage, ".meta-wrapper") %>% html_text(trim=TRUE) %>% ifelse(. == "", NA, .)

or this:

res <- html_nodes(htmlpage, ".meta-wrapper") %>% html_text(trim=TRUE)
res[!nzchar(res)] <- NA_character_

@Richard Scriven improvement:

html_nodes(htmlpage, ".meta-wrapper") %>% html_text(trim=TRUE) %>% replace(!nzchar(.), NA)


来源:https://stackoverflow.com/questions/36824794/inserting-na-in-blank-values-from-web-scraping

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!