How to grep a word exactly

前端 未结 3 1398
礼貌的吻别
礼貌的吻别 2020-12-03 22:11

I\'d like to grep for \"nitrogen\" in the following character vector and want to get back only the entry which is containing \"nitrogen\" and nothing of the rest (e.g. nitro

3条回答
  •  生来不讨喜
    2020-12-03 22:47

    Or use fixed = TRUE if you want to match actual string (regexlessly):

    v <- sample(c("nitrogen", "potassium", "hidrogen"), size = 100, replace = TRUE, prob = c(.8, .1, .1))
    grep("nitrogen", v, fixed = TRUE)
    # [1]   3   4   5   6   7   8   9  11  12  13  14  16  19  20  21  22  23  24  25
    # [20]  26  27  29  31  32  35  36  38  39  40  41  43  44  46  47  48  49  50  51
    # [39]  52  53  54  56  57  60  61  62  65  66  67  69  70  71  72  73  74  75  76
    # [58]  78  79  80  81  82  83  84  85  86  87  88  89  91  92  93  94  95  96  97
    # [77]  98  99 100
    

    Dunno about the speed issues, I like to test stuff and claim that approach A is faster than approach B, but in theory, at least from my experience, indexing/binary operators should be the fastest, so I vote for @Dason's approach. Also note that regexes are always slower than fixed = TRUE greping.

    A little proof is attached bellow. Note that this is a lame test, and system.time should be put inside replicate to get (more) accurate differences, you should take outliers into an account, etc. But surely this one proves that you should use which! =)

    (a0 <- system.time(replicate(1e5, grep("^nitrogen$", v))))
    # user  system elapsed 
    # 5.700   0.023   5.724  
    (a1 <- system.time(replicate(1e5, grep("nitrogen", v, fixed = TRUE))))
    # user  system elapsed 
    # 1.147   0.020   1.168 
    (a2 <- system.time(replicate(1e5, which(v == "nitrogen"))))
    # user  system elapsed 
    # 1.013   0.020   1.033 
    

提交回复
热议问题