问题
Here is a sample dataset. The scenario below represents a list of account transaction history:
- Two customers A and B;
- Customer A has 5 accounts 1, 2, 3, 4, 5 at Bank BA and Bank UB;
- Customer B has 2 accounts 1, 6 at Bank WF;
Note that Bank WF has issued an account ID to B that's same as A has at Bank BA (which could be true in reality).
data have; input Customer $ Account_id Bank_id $ Year; datalines; A 1 BA . A 2 UB . A 3 UB 2012 A 3 UB . A 4 UB . A 4 UB 2013 A 5 UB . B 1 WF 2014 B 1 WF . B 6 WF . ;
Some of the account open dates are missing, but for the same account, the open date should be the same. What I'm trying to do is to replace the missing date of an account with its previous/later value, if there's any; otherwise, keep it null.
data want;
input Customer $ Account_id Bank_id $ Year;
datalines;
A 1 BA .
A 2 UB .
A 3 UB 2012
A 3 UB 2012
A 4 UB 2013
A 4 UB 2013
A 5 UB .
B 1 WF 2014
B 1 WF 2014
B 6 WF .
;
I looked at this example: How to write first non-missing value to first missing observations which was helpful, but I can't tweak it to work for my case, which has multiple groups.
回答1:
You can do this in a single proc sql step :
proc sql ;
create table want2 as
select a.Customer, a.Account_id, a.Bank_id, max(a.Year,b.Year) as Year
from have a
left join
(select Customer, Account_id, Bank_id, max(Year) as Year
from have
group by Customer, Account_id, Bank_id) b on a.Customer = b.Customer
and a.Account_id = b.Account_id
and a.Bank_id = b.Bank_id ;
quit ;
回答2:
This should do it:
proc sort data=have;
by Customer
Bank_id
Account_id
descending Year;
run;
data want;
set have;
by Customer Bank_id Account_id;
retain year_tmp (.);
if not last.Account_id and Year ne . then year_tmp=Year;
else if Year = . then Year = year_tmp;
if last.Account_id then year_tmp = .;
drop year_tmp;
run;
What we did was to declare a retain variable, that is, one that keeps its value from one row to the next. Then using the last flag, we either store the existing Year in that retain variable (or attribute the already stored value to a missing Year). Finally we reset the retain variable at the last row of each Account_id.
来源:https://stackoverflow.com/questions/38491049/replace-missing-value-with-previous-or-next-observation-based-on-different-grou