Drop observations once condition is met by multiple variables

允我心安 提交于 2021-02-10 23:27:29

问题


I have the following data and used one of the existing answered questions to solve my data problem but could not get what I want. Here is what I have in my data

  • Amt1 is populated when the Evt_type is Fee
  • Amt2 is populated when the Evt_type is REF1/REF2
  • I don't want to display any observations after the last Flag='Y'
  • If there is no Flag='Y' then I want all the observations for that id (e.g. id=102)
  • I want to display if the next row for that id is a Fee followed by REF1/REF2 after flag='Y' (e.g. id=101) However I don't want if there is no REF1/REF2 (e.g.id=103)

Have:

       id   Date        Evt_Type   Flag   Amt1   Amt2
      101  2/2/2019      Fee              5
      101  2/3/2019      REF1      Y             5
      101  2/4/2019      Fee              10
      101  2/6/2019      REF2      Y             10
      101  2/7/2019      Fee               4
      101  2/8/2019      REF1
      102  2/2/2019      Fee              25
      102  2/2/2019      REF1      N      25
      103  2/3/2019      Fee              10
      103  2/4/2019      REF1      Y             10
      103  2/5/2019      Fee              10

Want:

      id   Date        Evt_Type   Flag   Amt1   Amt2
     101  2/2/2019      Fee              5
     101  2/3/2019      REF1      Y             5
     101  2/4/2019      Fee              10
     101  2/6/2019      REF2      Y             10
     101  2/7/2019      Fee               4
     101  2/8/2019      REF1
     102  2/2/2019      Fee              25
     102  2/2/2019      REF1      N      25
     103  2/4/2019      REF1      Y             10
     103  2/5/2019      Fee              10

I tried the following

data want;
    set have;
    by id Date;
    drop count;

    if (first.id or first.date) and FLAG='Y' then
        do;
            retain count;
            count=1;
            output;
            return;
        end;

    if count=1 and ((first.id or first.date) and Flag ne 'Y') then
        do;
            retain count;
            delete;
            return;
        end;
    output;
run;

Any help is appreciated.

Thanks


回答1:


A technique known as DOW loop can perform a computation that measures a group in some way and then, in a second loop, apply that computation to members of the group.

The DOW relies on a SET statement inside the loop. In this case the computation is 'what row in the group is the last one having flag="Y".

data want;
  * DOW loop, contains computation;

  _max_n_with_Y = 1e12;

  do _n_ = 1 by 1 until (last.id);
    set have;
    by id;
    if flag='Y' then _max_n_with_Y = _n_;
  end;

  * Follow up loop, applies computation;
  do _n_ = 1 to _n_;
    set have;
    if _n_ <= _max_n_with_Y then OUTPUT;
  end;
  drop _:;
run;



回答2:


Here is one way

data have;
input id $ Date : mmddyy10. Evt_Type $ Flag $ Amt1 Amt2;
format Date mmddyy10.;
infile datalines dsd missover;
datalines;
101,2/2/2019,Fee,,5,
101,2/3/2019,REF1,Y,,5
101,2/4/2019,Fee,,10,
101,2/6/2019,REF2,Y,,10
101,2/7/2019,Fee,,4,
102,2/2/2019,Fee,,25,
102,2/2/2019,REF1,N,25,
;

data want;
   do _N_ = 1 by 1 until (last.id);
      set have;
      by id;
      if flag = "Y" then _iorc_ = _N_;
   end;

   do _N_ = 1 to _N_;
      set have;
      if _N_ le _iorc_ then output;
   end;
   _iorc_=1e7;
run;


来源:https://stackoverflow.com/questions/60268829/drop-observations-once-condition-is-met-by-multiple-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!