Return value of for() loop as if it were a function in R

送分小仙女□ 提交于 2019-12-14 04:07:27

问题


I have this for loop in an R script:

url <- "https://example.com"
page <- html_session(url, config(ssl_verifypeer = FALSE))

links <- page %>% 
  html_nodes("td") %>% 
  html_nodes("tr") %>%
  html_nodes("a") %>% 
  html_attr("href")

base_names <- page %>%
  html_nodes("td") %>% 
  html_nodes("tr") %>%
  html_nodes("a") %>% 
  html_attr("href") %>%
  basename()

for(i in 1:length(links)) {

  site <- html_session(URLencode(
    paste0("https://example.com", links[i])),
    config(ssl_verifypeer = FALSE))

  writeBin(site$response$content, base_names[i])
} 

This loops through links, & downloads a text file to my working directory. I'm wondering if I can put return somewhere, so that it returns the document.

Reason being, is that I'm executing my script in NiFi (using ExecuteProcess), and it's not sending my scraped documents down the line. Instead, it just shows the head of my R script. I would assume you would wrap the for loop in a fun <- function(x) {}, but I'm not sure how to integrate the x into an already working scraper.

I need it to return documents down the flow, and not just this:

Processor config:

Even if you are not familiar with NiFi, it would be a great help on the R part! Thanks


回答1:


If your intent is to both (1) save the output (with writeBin) and (2) return the values (in a list), then try this:

out <- Map(function(ln, bn) {
  site <- html_session(URLencode(
    paste0("https://example.com", ln)),
    config(ssl_verifypeer = FALSE))
  writeBin(site$response$content, bn)
  site$response$content
}, links, base_names)

The use of Map "zips" together the individual elements. For a base-case, the following are identical:

Map(myfunc, list1)
lapply(list1, myfunc)

But if you want to use same-index elements from multiple lists, you can do one of

lapply(seq_len(length(list1)), function(i) myfunc(list1[i], list2[i], list3[i]))
Map(myfunc, list1, list2, list3)

where unrolling Map results effectively in:

myfunc(list1[1], list2[1], list3[1])
myfunc(list1[2], list2[2], list3[2])
# ...

The biggest difference between lapply and Map here is that lapply can only accept one vector, whereas Map accepts one or more (practically unlimited), zipping them together. All of the lists used must be the same length or length 1 (recycled), so it's legitimate to do something like

Map(myfunc, list1, list2, "constant string")

Note: Map-versus-mapply is similar to lapply-vs-sapply. For both, the first always returns a list object, while the second will return a vector IFF every return value is of the same length/dimension, otherwise it too will return a list.



来源:https://stackoverflow.com/questions/54484628/return-value-of-for-loop-as-if-it-were-a-function-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!