Select rows from data.frame ending with a specific character string in R

前端 未结 3 1896
孤街浪徒
孤街浪徒 2020-11-30 12:56

I\'m using R and I have a data.frame with nearly 2,000 entries that looks as follows:

> head(PVs,15)
     LogFreq   Word PhonCV  FreqDev
1593     140    w         


        
相关标签:
3条回答
  • 2020-11-30 13:22

    Using grep

    grep -xvE '.{17}(de|te).*' file.txt
    
    0 讨论(0)
  • 2020-11-30 13:26

    I modified the data a bit so that there were words that ended in te or de.

    > PV
         LogFreq   Word PhonCV  FreqDev
    1593     140 blahte    CVC 5.480774
    482      139    had    CVC 5.438114
    1681     138 aaaade   CVVC 5.395454
    1662     137    zei    CVV 5.352794
    1619     136   werd   CVCC 5.310134
    1592     135  waren CVV-CV 5.267474
    620      134    kon    CVC 5.224814
    646      133 kwamde   CCVC 5.182154
    483      132 hadden CVC-CV 5.139494
    436      131   ging    CVC 5.096834
    734      130 moeste  CVVCC 5.054174
    1171     129  stond  CCVCC 5.011514
    1654     128  zagde    CVC 4.968854
    1620     127 werden CVC-CV 4.926194
    1683     126 zouden CVV-CV 4.883534
    
    # Add a column to PV that you can visually check the regular expression matches.
    PV$Match <- grepl(pattern = "(de|te)$", PV$Word)
    
    # Subset PV data frame to show only TRUE matches
    PV <- PV[PV$Match == FALSE, ]
    

    The result is shown below

         LogFreq   Word PhonCV  FreqDev Match
    482      139    had    CVC 5.438114 FALSE
    1662     137    zei    CVV 5.352794 FALSE
    1619     136   werd   CVCC 5.310134 FALSE
    1592     135  waren CVV-CV 5.267474 FALSE
    620      134    kon    CVC 5.224814 FALSE
    483      132 hadden CVC-CV 5.139494 FALSE
    436      131   ging    CVC 5.096834 FALSE
    1171     129  stond  CCVCC 5.011514 FALSE
    1620     127 werden CVC-CV 4.926194 FALSE
    1683     126 zouden CVV-CV 4.883534 FALSE
    
    0 讨论(0)
  • 2020-11-30 13:30

    Method 1

    You can use grepl with an appropraite regular expression. Consider the following:

    x <- c("blank","wade","waste","rubbish","dedekind","bated")
    grepl("^.+(de|te)$",x)
    [1] FALSE  TRUE  TRUE FALSE FALSE FALSE
    

    The regular expression says begin (^) with anything any number of times (.+) and then find either de or te ((de|te)) then end ($).

    So for your data.frame try,

    subset(PVs,grepl("^.+(de|te)$",Word))
    

    Method 2

    To avoid the regexp method you can use a substr method instead.

    # substr the last two characters and test
    substr(x,nchar(x)-1,nchar(x)) %in% c("de","te")
    [1] FALSE  TRUE  TRUE FALSE FALSE FALSE
    

    So try:

    subset(PVs,substr(Word,nchar(Word)-1,nchar(Word)) %in% c("de","te"))
    
    0 讨论(0)
提交回复
热议问题