Selecting the earliest date from a given dataset in R

孤者浪人 提交于 2019-12-01 11:57:37

问题


I have a data set with many rows but I picked only a few as shown below and need to pick only the earliest SORT_DT among all with the rest all variables remaining the same.

        CUST_NO ID_NO SYMBOL  AUTO_CREATE_DT     CLASS_TYPE    SORT_DT
1         107   10120      1    2014-05-12             G/L  2015-01-09
2         107   10120      1    2014-05-12             G/L  2015-11-10
3         107   10120      1    2014-05-12             G/L  2014-06-18
4         107   10120      1    2014-05-12             G/L  2014-05-12
5         107   10120      1    2014-05-12             G/L  2015-07-10
6         107   10120      1    2014-05-12             G/L  2015-10-09
7         107   10120      1    2014-05-12             G/L  2016-04-08
8         107   10120      1    2014-05-12             G/L  2016-01-08
9         107   10120      1    2014-05-12             G/L  2016-12-22
10        107   10120      1    2014-05-12             G/L  2017-01-13
11        107   10120      1    2014-05-12             G/L  2016-07-08
12        107   10120      1    2014-05-12             G/L  2017-04-14
13        107   10120      1    2014-05-12             G/L  2017-04-17
14        107   10120      1    2014-05-12             G/L  2016-08-31
15        107   10120      1    2014-05-12             G/L  2015-04-10
16        107   10120      1    2014-05-12             G/L  2016-12-22

I need the output to be in the form of

      CUST_NO   ID_NO      SYMBOL  AUTO_CREATE_DT     CLASS_TYPE    SORT_DT
1         107     10120      1    2014-05-12             G/L     2014-05-12

Please let me know if anyone has a solution for this.

I have also added the new dataset which is

df <- fread("CUST_NO ID_NO SYMBOL  AUTO_CREATE_DT     CLASS_TYPE    SORT_DT
         107   10120      1    2014-05-12             G/L  2015-01-09
        107   10120      1    2014-05-12             G/L  2015-11-10
        107   10120      1    2014-05-12             G/L  2014-06-18
        107   10120      1    2014-05-12             G/L  2014-05-13
        107   10120      1    2014-05-12             G/L  2015-07-10
        107   10120      1    2014-05-12             G/L  2015-10-09
        107   10120      1    2014-05-12             G/L  2016-04-08
        107   10120      1    2014-05-12             G/L  2016-01-08
        107   10120      1    2014-05-12             G/L  2016-12-22
        107   10120      1    2014-05-12             G/L  2017-01-13
        107   10120      1    2014-05-12             G/L  2016-07-08
        108   10120      1    2014-05-12             G/L  2017-04-14
        108   10120      1    2014-05-12             G/L  2017-04-17
        108   10120      1    2014-05-12             G/L  2016-08-31
        108   10120      1    2014-05-12             G/L  2015-04-10
        108   10120      1    2014-05-12             G/L  2016-12-22")

The output should be different as below

  CUST_NO   ID_NO      SYMBOL  AUTO_CREATE_DT     CLASS_TYPE    SORT_DT
1         107     10120      1    2014-05-12             G/L     2014-05-13
2         108     10120      1    2014-05-12             G/L     2015-04-10    

回答1:


Try this:

aggregate(SORT_DT~.,min,data=df)

Output:

  CUST_NO ID_NO SYMBOL AUTO_CREATE_DT CLASS_TYPE    SORT_DT
1     107 10120      1     2014-05-12        G/L 2014-05-13
2     108 10120      1     2014-05-12        G/L 2015-04-10



回答2:


Try aggregate.

res <- aggregate(SORT_DT ~ CUST_NO + ID_NO + SYMBOL + AUTO_CREATE_DT + CLASS_TYPE, data = df, FUN = min)
res
  CUST_NO ID_NO SYMBOL AUTO_CREATE_DT CLASS_TYPE    SORT_DT
1     107 10120      1     2014-05-12        G/L 2014-05-13
2     108 10120      1     2014-05-12        G/L 2015-04-10


来源:https://stackoverflow.com/questions/46105926/selecting-the-earliest-date-from-a-given-dataset-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!