How to avoid a loop in R: selecting items from a list

99封情书 提交于 2019-11-27 10:34:27
liebke

You can use apply (or sapply)

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
f <- function(s) strsplit(s, "_")[[1]][1]
sapply(t, f)

bob_smith    mary_jane   jose_chung michael_marx charlie_ivan 

       "bob"       "mary"       "jose"    "michael"    "charlie" 

See: A brief introduction to “apply” in R

And one more approach:

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
pieces <- strsplit(t,"_")
sapply(pieces, "[", 1)

In words, the last line extracts the first element of each component of the list and then simplifies it into a vector.

How does this work? Well, you need to realise an alternative way of writing x[1] is "["(x, 1), i.e. there is a function called [ that does subsetting. The sapply call applies calls this function once for each element of the original list, passing in two arguments, the list element and 1.

The advantage of this approach over the others is that you can extract multiple elements from the list without having to recompute the splits. For example, the last name would be sapply(pieces, "[", 2). Once you get used to this idiom, it's pretty easy to read.

How about:

tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
fnames <- gsub("(_.*)$", "", tlist)
# _.* matches the underscore followed by a string of characters
# the $ anchors the search at the end of the input string
# so, underscore followed by a string of characters followed by the end of the input string

for the RegEx approach?

what about:

t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")

sub("_.*", "", t)

I doubt this is the most elegant solution, but it beats looping:

t.df <- data.frame(tsplit)
t.df[1, ]

Converting lists to data frames is about the only way I can get them to do what I want. I'm looking forward to reading answers by people who actually understand how to handle lists.

You almost had it. It really is just a matter of

  1. using one of the *apply functions to loop over your existing list, I often start with lapply and sometimes switch to sapply
  2. add an anonymous function that operates on one of the list elements at a time
  3. you already knew it was strsplit(string, splitterm) and that you need the odd [[1]][1] to pick off the first term of the answer
  4. just put it all together, starting with a preferred variable namne (as we stay clear of t or c and friends)

which gives

> tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan") 
> fnames <- sapply(tlist, function(x) strsplit(x, "_")[[1]][1]) 
> fnames 
  bob_smith    mary_jane   jose_chung michael_marx charlie_ivan   
      "bob"       "mary"       "jose"    "michael"    "charlie" 
>

You could use unlist():

> tsplit <- unlist(strsplit(t,"_"))
> tsplit
 [1] "bob"     "smith"   "mary"    "jane"    "jose"    "chung"   "michael"
 [8] "marx"    "charlie" "ivan"   
> t_out <- tsplit[seq(1, length(tsplit), by = 2)]
> t_out
[1] "bob"     "mary"    "jose"    "michael" "charlie"

There might be a better way to pull out only the odd-indexed entries, but in any case you won't have a loop.

And one other approach, based on brentonk's unlist example...

tlist <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
tsplit <- unlist(strsplit(tlist,"_"))
fnames <- tsplit[seq(1:length(tsplit))%%2 == 1]

jmc200

I would use the following unlist()-based method:

> t <- c("bob_smith","mary_jane","jose_chung","michael_marx","charlie_ivan")
> tsplit <- strsplit(t,"_")
> 
> x <- matrix(unlist(tsplit), 2)
> x[1,]
[1] "bob"     "mary"    "jose"    "michael" "charlie"

The big advantage of this method is that it solves the equivalent problem for surnames at the same time:

> x[2,]
[1] "smith" "jane"  "chung" "marx"  "ivan" 

The downside is that you'll need to be certain that all of the names conform to the firstname_lastname structure; if any don't then this method will break.

Virginie

from the original tsplit list object given at the beginning, this command will do:

unlist(lapply(tsplit,function(x) x[1]))

it extracts the first element of all list elements, then transforms a list to a vector. Unlisting first to a matrix, then extracting the fist column is also ok, but then you are dependent on the fact that all list elements have the same length. Here is the output:

> tsplit

[[1]]
[1] "bob"   "smith"

[[2]]
[1] "mary" "jane"

[[3]]
[1] "jose"  "chung"

[[4]]
[1] "michael" "marx"   

[[5]]
[1] "charlie" "ivan"   

> lapply(tsplit,function(x) x[1])

[[1]]
[1] "bob"

[[2]]
[1] "mary"

[[3]]
[1] "jose"

[[4]]
[1] "michael"

[[5]]
[1] "charlie"

> unlist(lapply(tsplit,function(x) x[1]))

[1] "bob"     "mary"    "jose"    "michael" "charlie"
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!