问题
I have the following problem:
I want to fill missing values with proc expand be simply taking the value from the next data row.
My data looks like this:
date;index;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;.
05.Jul09;.
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;.
12.Jul09;.
13.Jul09;-1683
As you can see for some dates the index is missing. I want to achieve the following:
date;index;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;-1688
05.Jul09;-1688
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;-1683
12.Jul09;-1683
13.Jul09;-1683
As you can see the values for the missing data where taken from the next row (11.Jul09 and 12Jul09 got the value from 13Jul09)
So proc expand seems to be the right approach and i started using this code:
PROC EXPAND DATA=DUMMY
OUT=WORK.DUMMY_TS
FROM = DAY
ALIGN = BEGINNING
METHOD = STEP
OBSERVED = (BEGINNING, BEGINNING);
ID date;
CONVERT index /;
RUN;
QUIT;
This filled the gaps but from the previous row and whatever I set for ALIGN, OBSERVED or even sorting the data descending I do not achieve the behavior I want.
If you know how to make it right it would be great if you could give me a hint. Good papers on proc expand are apprechiated as well.
Thanks for your help and kind regards Stephan
回答1:
I don't know about proc expand. But apparently this can be done with a few steps.
Read the dataset and create a new variable that will get the value of n.
data have;
    set have;
    pos = _n_;
run;
Sort this dataset by this new variable, in descending order.
proc sort data=have;
    by descending pos;
run;
Use Lag or retain to fill the missing values from the "next" row (After sorting, the order will be reversed).
data want;
    set have (rename=(index=index_old));
    retain index;
    if not missing(index_old) then index = index_old;
run;
Sort back if needed.
proc sort data=want;
    by pos;
run;
回答2:
I'm no PROC EXPAND expert but this is what I came up with. Create LEADS for the maximum gap run (2) then coalesce them into INDEX.
data index;
   infile cards dsd dlm=';';
   input date:date11. index;
   format date date11.;
   cards4;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;.
05.Jul09;.
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;.
12.Jul09;.
13.Jul09;-1683
;;;;
   run;
proc print;
   run;
PROC EXPAND DATA=index OUT=index2 method=none;
   ID date;
   convert index=lead1 / transform=(lead 1);
   CONVERT index=lead2 / transform=(lead 2);
   RUN;
   QUIT;
proc print; 
   run;
data index3;
   set index2;
   pocb = coalesce(index,lead1,lead2);
   run;
proc print;
   run;
Modified to work for any reasonable gap size.
data index;
   infile cards dsd dlm=';';
   input date:date11. index;
   format date date11.;
   cards4;
27.Jun09;
28.Jun09;
29.Jun09;-1693
30.Jun09;-1692
01.Jul09;-1691
02.Jul09;-1690
03.Jul09;-1689
04.Jul09;.
05.Jul09;.
06.Jul09;-1688
07.Jul09;-1687
08.Jul09;-1686
09.Jul09;-1685
10.Jul09;-1684
11.Jul09;.
12.Jul09;.
13.Jul09;-1683
14.Jul09;
15.Jul09;
16.Jul09;
17.Jul09;-1694
;;;;
   run;
proc print;
   run;
/* find the largest gap */
data gapsize(keep=n);
   set index;
   by index notsorted;
   if missing(index) then do;
      if first.index then n=0;
      n+1;
      if last.index then output;
      end;
   run;
proc summary data=gapsize;
   output out=maxgap(drop=_:) max(n)=maxgap;
   run;
/* Gen the convert statement for LEADs */
filename FT67F001 temp;
data _null_;
   file FT67F001;
   set maxgap;
   do i = 1 to maxgap;
      put 'Convert index=lead' i ' / transform=(lead ' i ');';
      end;
   stop;
   run;
proc expand data=index out=index2 method=none;
   id date;
   %inc ft67f001;
   run;
   quit;
data index3;
   set index2;
   pocb = coalesce(index,of lead:);
   drop lead:;
   run;
proc print;
   run;
来源:https://stackoverflow.com/questions/35207593/use-sas-proc-expand-for-filling-missing-values