问题
I always thought that the variables are set to missing for every iteration of the data step . However, in the following code, it looks like the value that the variable gets at the very beginning retains. I can't understand why this happens ?
data one;
input x $ y;
datalines;
a 10
a 13
a 14
b 9
;
run;
data two;
input z;
datalines;
45
;
run;
data test;
if _n_ = 1 then set two; /* when _n_=2 the PDV assigns missing values, right ? */
set one;
run;
proc print;
run;
The outcome is
z x y
45 a 10
45 a 13
45 a 14
45 b 9
I was expecting to get this
z x y
45 a 10
. a 13
. a 14
. b 9
回答1:
SAS does not reset the values in PDV for - SET, MERGE, MODIFY, or UPDATE statements. Since you are using SET statement so SAS is not resetting it.
if _n_ = 1 then set two;
http://support.sas.com/documentation/cdl/en/lrcon/65287/HTML/default/viewer.htm#p08a4x7h9mkwqvn16jg3xqwfxful.htm
Read - The Execution Phase - Pointer 5
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000961108.htm
回答2:
SAS sets a flag for each variable in the PDV, which specifies what will happen to it when the data step returns to the beginning of the loop. This flag indicates that either a variable will be reset to missing, or it will not be reset to missing (and will retain is current value).
By default, this flag indicates that a variable should be reset. This flag usually is set to 'retain value' in one of two ways.
- First, if a variable is present in a RETAIN statement, or has the SUM operator used with it on the left side (
x+1;), the flag is set for that variable. - Second, if a variable is present on a
set,merge, 'modify', orupdatestatement, the flag is set for that variable.
In this case, your variable z is present on a set statement, so it is automatically retained.
Here's another good example of this working.
data test1;
do x=1 to 5;
y=2;
output;
end;
run;
data test2;
do x=6 to 10;
output;
end;
run;
data test3;
set test1 test2;
if x=7 then y=4;
run;
Here, y would be set to missing after the last record of test1 is read; that's because at the end of a by group or dataset, it sets all variables to missing once. However, y is still automatically retained; that flag isn't something that can change. So when I set y=4; on the x=7 record, that 4 is retained all the way throughout. So x=6 has a missing y but x=7 through x=10 have y=4.
But wait, you say. My variables x and y are also present on a set statement, and they're not automatically retained. They get re-set each time the data step reads from the dataset.
Nope. They get set to a new value, yes: but they're never set to missing. This has special relevance in a few cases: many to one merges, which basically work like the above, but with by groups: the one record is merged to all of the many, not because it is read in multiple times, but because it is read in once and then not re-set to missing (ie, is retained). This is why a many to one merge is a bit dangerous if you're not aware of this:
data test1;
do x=1 to 5;
z=0;
output;
end;
run;
data test2;
do x=1 to 5;
do y=1 to 3;
output;
end;
end;
run;
data testMerge;
merge test1 test2;
by x;
if y=2 then z=1;
run;
Notice that z=1 is true for y=2 and for y=3 records, even though I didn't ask for that! Oops! That's because z was read in from test1 once for the first of each x by group record, and then not re-read after that - just retained.
来源:https://stackoverflow.com/questions/27477387/sas-are-variables-set-to-missing-at-every-iteration-of-a-data-step