Web scraping with R and rvest

青春壹個敷衍的年華 提交于 2019-12-05 19:49:59

I'm not really up to speed on all of the pipes and associated code, so there's probably some new fandangled tools to do this...but given that the answer above gets you to "83/100", you can do something like this:

as.numeric(unlist(strsplit("83/100", "/")))[1]
[1] 83

Which I guess would look something like this with the pipes:

lego_movie %>% 
  html_node(".star-box-details a:nth-child(4)") %>%
  html_text(trim=TRUE) %>%
  strsplit(., "/") %>%
  unlist(.) %>%
  as.numeric(.) %>% 
  head(., 1)

[1] 83

Or as Frank suggested, you could evaluate the expression "83/100" with something like:

lego_movie %>% 
  html_node(".star-box-details a:nth-child(4)") %>%
  html_text(trim=TRUE) %>%
  parse(text = .) %>%
  eval(.)
[1] 0.83
user227710

You can see that before converting into numeric, it returns a " 83/100\n"

lego_movie %>% 
    html_node(".star-box-details a:nth-child(4)") %>%
     html_text() 
# [1] " 83/100\n"

You can use trim=TRUE to omit \n. You can't convert this to numeric because you have /. :

lego_movie %>% 
     html_node(".star-box-details a:nth-child(4)") %>%
     html_text(trim=TRUE) 
# [1] "83/100"

If you convert this to numeric, you will get NA with warnings which is not unexpected:

# [1] NA
# Warning message:
# In function_list[[k]](value) : NAs introduced by coercion

If you want the numeric 83 as the final answer, you can use regular expression tools like gsub to remove 100 and \ (assuming that the full score is 100 for all movies).

lego_movie %>% 
    html_node(".star-box-details a:nth-child(4)") %>%
     html_text(trim=TRUE) %>%
     gsub("100|\\/","",.)%>%
     as.numeric()
# [1] 83
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!