问题
I am working in SAS trying to create a conditional probability table.
The current structure of the table is: 5 columns x 10 rows --> the value in each cell is binary. Current Data Table
col1 col2 col3 col4 col5
1 0 1 0 0
0 0 0 1 1
0 0 0 0 0
1 0 0 0 0
1 0 0 0 1
0 1 0 0 0
0 1 0 1 0
1 1 1 1 0
1 0 1 0 1
1 0 1 0 0
I would like to create a table with the conditional probability for every column vs every other column. Ideal Output
--- col1 col2 col3 col4 col5
col1 1.0 0.3 1.0 0.3 0.7
col2 0.2 1.0 0.3 0.7 0.0
col3 0.7 0.3 1.0 0.3 0.3
col4 0.2 0.7 0.3 1.0 0.3
col5 0.3 0.0 0.3 0.3 1.0
This is a much simpler version of the actual problem I am working on (100s of rows & millions of columns, so I'd ideally have a solution which could adjust based on the size of the table).
I've been working with the array and do loop, but haven't been able to get very far.
My current code looks like this (not close to complete):
data ideal_output;
set binary_table;
array obs(10,5);
array output(5,5);
do i=1 to 5;
do j=1 to 5;
do k=1 to 10;
do l=1 to 10;
output(m,n) = sum(obs(k,i)*obs(l,j))/sum(obs(k,i));
end;end;end;end;
run;
回答1:
You have the right sort of idea - the tricky part is loading all your variables into the appropriate arrays. If your full dataset is too large to fit into memory you may need to process one subset of it at a time.
data have;
/*Set length 3 for binary vars to save a bit of memory later*/
length col1-col5 3;
input col1-col5;
cards;
1 0 1 0 0
0 0 0 1 1
0 0 0 0 0
1 0 0 0 0
1 0 0 0 1
0 1 0 0 0
0 1 0 1 0
1 1 1 1 0
1 0 1 0 1
1 0 1 0 0
;
run;
%let NCOLS = 5;
%let NOBS = 10;
data want;
if 0 then set have;
array obs[&NOBS,&NCOLS];
array p[&NCOLS];
array col[&NCOLS];
/*Use a DOW-loop to populate the 2-d array*/
do _n_ = 1 by 1 until (eof);
set have end = eof;
do i = 1 to &NCOLS;
obs[_n_,i] = col[i];
end;
end;
do i=1 to &NCOLS;
do j=1 to &NCOLS;
x = 0;
y = 0;
do k=1 to &NOBS;
x + obs[k,i]*obs[k,j];
y + obs[k,j];
end;
p[j] = x / y;
end;
output;
end;
keep p1-p5;
run;
回答2:
You can probably do something equivalent with a summarization proc. It will be a bit messy as you'll have to do some transposing probably and get rid of the '0' rows, but this will start you off perhaps?
proc tabulate data=have out=want;
class col1-col5;
tables (col1-col5),(col1-col5)*colpctn/printmiss misstext='0';
run;
data want_fortran;
set want;
if sum(of col1-col5) = 2;
run;
Then you can use which columns of col1-col5 are populated to generate column/row names and transpose the dataset.
来源:https://stackoverflow.com/questions/39500028/conditional-probability-table-in-sas