Conditional Probability Table in SAS

假如想象 提交于 2019-12-12 02:21:28

问题


I am working in SAS trying to create a conditional probability table.

The current structure of the table is: 5 columns x 10 rows --> the value in each cell is binary. Current Data Table

col1    col2    col3    col4    col5
1   0   1   0   0
0   0   0   1   1
0   0   0   0   0
1   0   0   0   0
1   0   0   0   1
0   1   0   0   0
0   1   0   1   0
1   1   1   1   0
1   0   1   0   1
1   0   1   0   0

I would like to create a table with the conditional probability for every column vs every other column. Ideal Output

--- col1    col2    col3    col4    col5
col1    1.0 0.3 1.0 0.3 0.7
col2    0.2 1.0 0.3 0.7 0.0
col3    0.7 0.3 1.0 0.3 0.3
col4    0.2 0.7 0.3 1.0 0.3
col5    0.3 0.0 0.3 0.3 1.0

This is a much simpler version of the actual problem I am working on (100s of rows & millions of columns, so I'd ideally have a solution which could adjust based on the size of the table).

I've been working with the array and do loop, but haven't been able to get very far.

My current code looks like this (not close to complete):

data ideal_output;
    set binary_table;
    array obs(10,5);
    array output(5,5);
    do i=1 to 5;
        do j=1 to 5;
            do k=1 to 10;
                do l=1 to 10;
        output(m,n) = sum(obs(k,i)*obs(l,j))/sum(obs(k,i));
    end;end;end;end;
run;

回答1:


You have the right sort of idea - the tricky part is loading all your variables into the appropriate arrays. If your full dataset is too large to fit into memory you may need to process one subset of it at a time.

data have;
/*Set length 3 for binary vars to save a bit of memory later*/
length col1-col5 3;
input col1-col5;
cards;
1   0   1   0   0
0   0   0   1   1
0   0   0   0   0
1   0   0   0   0
1   0   0   0   1
0   1   0   0   0
0   1   0   1   0
1   1   1   1   0
1   0   1   0   1
1   0   1   0   0
;
run;

%let NCOLS = 5;
%let NOBS = 10;

data want;
    if 0 then set have;
    array obs[&NOBS,&NCOLS];
    array p[&NCOLS];
    array col[&NCOLS];

    /*Use a DOW-loop to populate the 2-d array*/
    do _n_ = 1 by 1 until (eof);
        set have end = eof;
        do i = 1 to &NCOLS;
            obs[_n_,i] = col[i];
        end;
    end;

    do i=1 to &NCOLS;
        do j=1 to &NCOLS;
            x = 0;
            y = 0;
            do k=1 to &NOBS;
                x + obs[k,i]*obs[k,j];
                y + obs[k,j];
            end;
            p[j] = x / y;
        end;
        output;
    end;
    keep p1-p5; 
run;



回答2:


You can probably do something equivalent with a summarization proc. It will be a bit messy as you'll have to do some transposing probably and get rid of the '0' rows, but this will start you off perhaps?

proc tabulate data=have out=want;
  class col1-col5;
  tables (col1-col5),(col1-col5)*colpctn/printmiss misstext='0';
run;

data want_fortran;
  set want;
  if sum(of col1-col5) = 2;
run;

Then you can use which columns of col1-col5 are populated to generate column/row names and transpose the dataset.



来源:https://stackoverflow.com/questions/39500028/conditional-probability-table-in-sas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!