R: Subset / index an object by substring of the wanted entries

和自甴很熟 提交于 2019-12-02 13:12:02

问题


Is it possible to extract/subset a dataframe by indicating only a chunk of the wanted entries-string?

The filter criteria is stored in an factor vector. But there are only the first three digits indicated. This should determine to subset all entries of the dataframe starting with them.

Example:

 # Input dataframe
 data <- read.table(header=T, text='
             ID sex size
        0120010   M    7
        0120020   F    6
        0121031   F    9
        0130010   M   11
        0130020   M   11
        0130030   F   14
        0130040   M   11
        0150030   F   11
        0150110   F   12
        0180030   F    9
        1150110   F   12
        9180030   F    9
        'colClasses =c("character", "factor", "integer"))

 # Input vector/factor with the ID chunk, containing only the fist three digits
 # of the targeted entries in data$ID
 IDfilter <- c("012", "015", "115")

 # My try/idea which sadly is not working - PLEASE HELP HERE
 subset <- data[ID %in% paste(IDfilter, "?.", sep=""),]

 # Expected subset
 > subset
           ID sex size
 1    0120010   M    7
 2    0120020   F    6
 3    0121031   F    9
 4    0150030   F   11
 5    0150110   F   12
 6    1150110   F   12

Thank you! :)


回答1:


Something like this?

data <- read.table(header=T, text='
             ID sex size
         0120010   M    7
        0120020   F    6
        0121031   F    9
        0130010   M   11
        0130020   M   11
        0130030   F   14
        0130040   M   11
        0150030   F   11
        0150110   F   12
        0180030   F    9
        1150110   F   12
        9180030   F    9
        ', colClasses =c("character", "factor", "integer"))

 IDfilter <- c("012", "015", "115") # filter must be character vector



   data[substr(data[,"ID"], 1,3) %in% IDfilter, ]
#        ID sex size
#1  0120010   M    7
#2  0120020   F    6
#3  0121031   F    9
#8  0150030   F   11
#9  0150110   F   12
#11 1150110   F   12

Note the colClases. In this case, ID is suppose to be character in order to allow the first number to be 0 as in 0120010 otherwise (if it's numeric or integer) this number would be 120010

Another alternative is

data[substr(data[,"ID"], 1,nchar(IDfilter)[1]) %in% IDfilter, ]

where the third argument of substr is automatically updated to be the number of characters of the first element in IDfileter, the assumption here is that each number in IDfilter has the same number of characters.




回答2:


A regex approach:

subset(data, grepl(paste0("^",IDfilter,collapse="|"), ID))

        ID sex size
1  0120010   M    7
2  0120020   F    6
3  0121031   F    9
8  0150030   F   11
9  0150110   F   12
11 1150110   F   12

Note: "^" is to match the beginning of the string. I'm assuming there are only digits in your filters.



来源:https://stackoverflow.com/questions/18522516/r-subset-index-an-object-by-substring-of-the-wanted-entries

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!