SAS . Are variables set to missing at every iteration of a data step?

走远了吗. 提交于 2020-02-03 07:56:27

问题


I always thought that the variables are set to missing for every iteration of the data step . However, in the following code, it looks like the value that the variable gets at the very beginning retains. I can't understand why this happens ?

data one;
input x $ y;
datalines;
a 10
a 13
a 14
b 9
;
run;

data two;
input z;
datalines;
45
;
run;

data test;
if _n_ = 1 then set two; /* when _n_=2 the PDV assigns missing values, right ? */
set one;
run;
proc print;
run; 

The outcome is

   z      x     y  
   45     a    10
   45     a    13
   45     a    14
   45     b     9

I was expecting to get this

   z      x     y  
   45     a    10
   .      a    13
   .      a    14
   .      b     9

回答1:


SAS does not reset the values in PDV for - SET, MERGE, MODIFY, or UPDATE statements. Since you are using SET statement so SAS is not resetting it.

if _n_ = 1 then set two;

http://support.sas.com/documentation/cdl/en/lrcon/65287/HTML/default/viewer.htm#p08a4x7h9mkwqvn16jg3xqwfxful.htm

Read - The Execution Phase - Pointer 5

http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000961108.htm




回答2:


SAS sets a flag for each variable in the PDV, which specifies what will happen to it when the data step returns to the beginning of the loop. This flag indicates that either a variable will be reset to missing, or it will not be reset to missing (and will retain is current value).

By default, this flag indicates that a variable should be reset. This flag usually is set to 'retain value' in one of two ways.

  • First, if a variable is present in a RETAIN statement, or has the SUM operator used with it on the left side (x+1;), the flag is set for that variable.
  • Second, if a variable is present on a set, merge, 'modify', or update statement, the flag is set for that variable.

In this case, your variable z is present on a set statement, so it is automatically retained.

Here's another good example of this working.

data test1;
do x=1 to 5;
  y=2;
  output;
end;
run;

data test2;
  do x=6 to 10;
    output;
  end;
run;

data test3;
  set test1 test2;
  if x=7 then y=4;
run;

Here, y would be set to missing after the last record of test1 is read; that's because at the end of a by group or dataset, it sets all variables to missing once. However, y is still automatically retained; that flag isn't something that can change. So when I set y=4; on the x=7 record, that 4 is retained all the way throughout. So x=6 has a missing y but x=7 through x=10 have y=4.

But wait, you say. My variables x and y are also present on a set statement, and they're not automatically retained. They get re-set each time the data step reads from the dataset.

Nope. They get set to a new value, yes: but they're never set to missing. This has special relevance in a few cases: many to one merges, which basically work like the above, but with by groups: the one record is merged to all of the many, not because it is read in multiple times, but because it is read in once and then not re-set to missing (ie, is retained). This is why a many to one merge is a bit dangerous if you're not aware of this:

data test1;
  do x=1 to 5;
    z=0;
    output;
  end;
run;

data test2;
  do x=1 to 5;
    do y=1 to 3;
      output;
    end;
  end;
run;

data testMerge;
  merge test1 test2;
  by x;
  if y=2 then z=1;
run;

Notice that z=1 is true for y=2 and for y=3 records, even though I didn't ask for that! Oops! That's because z was read in from test1 once for the first of each x by group record, and then not re-read after that - just retained.



来源:https://stackoverflow.com/questions/27477387/sas-are-variables-set-to-missing-at-every-iteration-of-a-data-step

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!