SAS/PROC SQL - remove ALL observations in BY group as long as there are duplications (not just remove the duplications)

早过忘川 提交于 2019-12-25 04:31:14

问题


I am new to SAS and I am trying to remove groups if they fulfil two conditions. I currently have this data set:

ID ID_2 ID_3;

A 1 1;

A 1 1;

A 1 1;

A 2 0;

A 2 1;

B 3 0;

B 3 0;

I am grouping by ID then by ID_2.

I want to remove ALL entries in the by groups as long as (1) there exists duplication across all three variables - I don't just want to remove the duplicates, I would like to remove the entire group AND (2) this duplication involves value '1' in ID_3 across all rows in each by group.

In other words, the outcome I want is:

ID ID_2 ID_3;

A 2 0;

A 2 1;

B 3 0;

B 3 0;

I have spent at least 5 hours on this and I have tried various methods:

  • first. and last. (this does not guarantee that all observations in the by group match)

  • nodup (this method only removes the duplicates - I would like to remove even the first row of the group)

  • lag (again, the first row of the group stays which is not what I want)

I am open to using proc sql as well. Would really appreciate any input at all, thank you in advance!


回答1:


I believe this will accomplish what you want. The logic could be tweaked to be a little more clear, I guess, but it worked when I tested it.

data x;
    input id $ id_2 id_3;
cards;
A 1 1
A 1 1
A 1 1
A 2 0
A 2 1
B 3 0
B 3 0
;
run;

* I realize the data are already sorted, but I think it is better
* not to assume they are.;
proc sort data=x;
    by id id_2 id_3;
run;

* It is helpful to create a dataset for the duplicates as well as the 
* unduplicated observations.;
data nodups
     dups
     ;

    set x;
    by id id_2 id_3;

    * When FIRST.ID_3 and LAST.ID_3 at the same time, there is only
    * one obs in the group, so keep it;
    if first.id_3 and last.id_3
     then output nodups;

     * Otherwise, we know we have more than one obs. According to
     * the OP, we keep them, too, unless ID_3 = 1;
     else do;
        if id_3 = 1
         then output dups;
         else output nodups;
     end;

run;


来源:https://stackoverflow.com/questions/40541843/sas-proc-sql-remove-all-observations-in-by-group-as-long-as-there-are-duplicat

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!