Use lapply on a subset of list elements and return list of same length as original in R

拜拜、爱过 提交于 2019-12-07 07:56:46

问题


I want to apply a regex operation to a subset of list elements (which are character strings) using lapply and return a list of same length as the original. The list elements are long strings (derived from reading in long text files and collapsing paragraphs into a single string). The regex operation is valid only for the subset of list elements/strings. I want the non-subsetted list elements (character strings) to be returned in their original state.

The regex operation is str_extract from the stringr package, i.e. I want to extract a substring from a longer string. I subset the list elements based on a regex pattern in the filename.

An example with simplified data:

library(stringr)
texts <- as.list(c("abcdefghijkl", "mnopqrstuvwxyz", "ghijklmnopqrs", "uvwxyzabcdef"))
filenames <- c("AB1997R.txt", "BG2000S.txt", "MN1999R.txt", "DC1997S.txt")
names(texts) <- filenames
regexp <- "abcdef"

I know in advance to which strings I want to apply the regex operation, and hence I want to subset these strings. That is, I don't want to run the regex over all elements in the list, as doing so will return some invalid results (which is not apparent in this simplified example).

I've made a few naive efforts, e.g.:

x <- lapply(texts[str_detect(names(texts), "1997")], str_extract, regexp)
> x
$AB1997R.txt
[1] "abcdef"

$DC1997S.txt
[1] "abcdef"

which returns a reduced-length list containing just the substrings found. But the results I want to get are:

> x
$AB1997R.txt
[1] "abcdef"

$BG2000S.txt
[1] "mnopqrstuvwxyz"

$MN1999R.txt
[1] "ghijklmnopqrs"

$DC1997S.txt
[1] "abcdef"

where the strings not containing the regex pattern are returned in their original state.

I have informed myself about stringr, lapply and llply (in the plyr package), but many operations are illustrated using dataframes as examples, not lists, and don't involve regex operations on character strings. I can achieve my goal using a for loop, but I'm trying to get away from that, as is generally advised, and get better at using the apply-class of functions.


回答1:


You can use the subset operator [<-:

x <- texts
is1997 <- str_detect(names(texts), "1997")
x[is1997] <- lapply(texts[is1997], str_extract, regexp)
x
# $AB1997R.txt
# [1] "abcdef"
#
# $BG2000S.txt
# [1] "mnopqrstuvwxyz"
#
# $MN1999R.txt
# [1] "ghijklmnopqrs"
#
# $DC1997S.txt
# [1] "abcdef"
#



回答2:


You can try sub

  sub(paste0('.*(', regexp, ').*'), '\\1', texts)
  # AB1997R.txt      BG2000S.txt      MN1999R.txt      DC1997S.txt 
  #  "abcdef" "mnopqrstuvwxyz"  "ghijklmnopqrs"         "abcdef" 

Also, if you need to match the names of 'texts' with 1997, we can use grep

  indx <- grep('1997', names(texts))
  texts[indx] <- sub(paste0('.*(', regexp, ').*'), '\\1', texts[indx])
  as.list(texts)


来源:https://stackoverflow.com/questions/30562107/use-lapply-on-a-subset-of-list-elements-and-return-list-of-same-length-as-origin

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!