R: Subsetting a data frame using a list of dates as the filter

前端 未结 3 515
臣服心动
臣服心动 2020-12-24 07:19

I have a data frame with a date column and some other value columns. I would like to extract from the data frame those rows in which the date column matches any of the eleme

相关标签:
3条回答
  • 2020-12-24 07:38

    Or you can go the other way round to what @RYogi suggested and convert the Date into a string:

    testdf[as.character(testdf$mydate) %in% c('2012-01-05', '2012-01-09'),]
          mydate col1 col2 col3
    5 2012-01-05    5   15   25
    9 2012-01-09    9   19   29
    

    Edit: timing

    Converting Date to a string is slightly faster, but it doesn't really make a difference:

    library(rbenchmark)
    benchmark(asDate=testdf[testdf$mydate %in% as.Date(c('2012-01-05', '2012-01-09')),],
      asString=testdf[as.character(testdf$mydate) %in% c('2012-01-05', '2012-01-09'),], 
      replications=1000)
    
    #     test replications elapsed relative user.self sys.self user.child
    # 1   asDate         1000   0.211 1.076531     0.212        0          0
    # 2 asString         1000   0.196 1.000000     0.192        0          0
    #  sys.child
    # 1         0
    # 2         0
    
    0 讨论(0)
  • 2020-12-24 07:44

    You have to convert the date string into a Date variable using as.Date (try ?as.Date at the console). Bonus: you can drop which:

    > testdf[testdf$mydate %in% as.Date(c('2012-01-05', '2012-01-09')),]
          mydate col1 col2 col3
    5 2012-01-05    5   15   25
    9 2012-01-09    9   19   29
    
    0 讨论(0)
  • 2020-12-24 07:44

    Both suggestions so far are definitely good, but if you are going to be doing a lot of work with dates, you may want to invest some time with the xts package:

    # Some sample data for 90 consecutive days 
    set.seed(1)
    testdf <- data.frame(mydate = seq(as.Date('2012-01-01'), 
                                      length.out=90, by = 'day'),
                         col1 = rnorm(90), col2 = rnorm(90),
                         col3 = rnorm(90))
    
    # Convert the data to an xts object
    require(xts)
    testdfx = xts(testdf, order.by=testdf$mydate)
    
    # Take a random sample of dates
    testdfx[sample(index(testdfx), 5)]
    #                   col1        col2        col3
    # 2012-01-17 -0.01619026  0.71670748  1.44115771
    # 2012-01-29 -0.47815006  0.49418833 -0.01339952
    # 2012-02-05 -0.41499456  0.71266631  1.51974503
    # 2012-02-27 -1.04413463  0.01739562 -1.18645864
    # 2012-03-26  0.33295037 -0.03472603  0.27005490
    
    # Get specific dates
    testdfx[c('2012-01-05', '2012-01-09')]
    #                 col1      col2       col3
    # 2012-01-05 0.3295078  1.586833  0.5210227
    # 2012-01-09 0.5757814 -1.224613 -0.4302118
    

    You can also get dates from another vector.

    # Get dates from another vector
    lookup = c("2012-01-12", "2012-01-31", "2012-03-05", "2012-03-19")
    testdfx[lookup]
    testdfx[lookup]
    #                   col1        col2       col3
    # 2012-01-12  0.38984324  0.04211587  0.4020118
    # 2012-01-31  1.35867955 -0.50595746 -0.1643758
    # 2012-03-05 -0.74327321 -1.48746031  1.1629646
    # 2012-03-19  0.07434132 -0.14439960  0.3747244
    

    The xts package will give you intelligent subsetting options. For instance, testdfx["2012-03"] will return all the data from March; testdfx["2012"] will return for the year; testdfx["/2012-02-15"] will return the data from the start of the dataset to February 15; and testdfx["2012-02-15/"] will go from February 15 to the end of the dataset.

    0 讨论(0)
提交回复
热议问题