Finding Elements of Lists in R

后端 未结 3 1198
执念已碎
执念已碎 2020-12-16 01:33

Right now I\'m working with a character vector in R, that i use strsplit to separate word by word. I\'m wondering if there\'s a function that I can use to check the whole li

相关标签:
3条回答
  • 2020-12-16 02:08

    As alexwhan says, grep is the function to use. However, be careful about using it with a list. It isn't doing what you might think it's doing. For example:

    grep("c", z)
    [1] 1 2 3   # ?
    
    grep(",", z)
    [1] 1 2 3   # ???
    

    What's happening behind the scenes is that grep coerces its 2nd argument to character, using as.character. When applied to a list, what as.character returns is the character representation of that list as obtained by deparsing it. (Modulo an unlist.)

    as.character(z)
    [1] "c(\"a\", \"b\", \"c\")" "c(\"b\", \"d\", \"e\")" "c(\"a\", \"e\", \"f\")"
    
    cat(as.character(z))
    c("a", "b", "c") c("b", "d", "e") c("a", "e", "f")
    

    This is what grep is working on.

    If you want to run grep on a list, a safer method is to use lapply. This returns another list, which you can operate on to extract what you're interested in.

    res <- lapply(z, function(ch) grep("a", ch))
    res
    [[1]]
    [1] 1
    
    [[2]]
    integer(0)
    
    [[3]]
    [1] 1
    
    
    # which vectors contain a search term
    sapply(res, function(x) length(x) > 0)
    [1]  TRUE FALSE  TRUE
    
    0 讨论(0)
  • 2020-12-16 02:09

    You're looking for grep():

    grep("a", z)
    #[1] 1 3
    
    grep("b", z)
    #[1] 1 2
    
    0 讨论(0)
  • 2020-12-16 02:11

    Much faster than grep is:

    sapply(x, function(y) x %in% y)
    

    and if you want the index of course just use which():

    which(sapply(x, function(y) x %in% y))
    

    Evidence!

    x = setNames(replicate(26, list(sample(LETTERS, 10, rep=T))), sapply(LETTERS, list))
    
    head(x)
    
    $A
     [1] "A" "M" "B" "X" "B" "J" "P" "L" "M" "L"
    
    $B
     [1] "H" "G" "F" "R" "B" "E" "D" "I" "L" "R"
    
    $C
     [1] "P" "R" "C" "N" "K" "E" "R" "S" "N" "P"
    
    $D
     [1] "F" "B" "B" "Z" "E" "Y" "J" "R" "H" "P"
    
    $E
     [1] "O" "P" "E" "X" "S" "Q" "S" "A" "H" "B"
    
    $F
     [1] "Y" "P" "T" "T" "P" "N" "K" "P" "G" "P"
    
    system.time(replicate(1000, grep("A", x)))
    
       user  system elapsed 
       0.11    0.00    0.11 
    
    system.time(replicate(1000, sapply(x, function(y) "A" %in% y)))
    
       user  system elapsed 
       0.05    0.00    0.05 
    
    0 讨论(0)
提交回复
热议问题