Pandas crosstab matrix dot nansum

ぃ、小莉子 提交于 2019-12-24 02:10:36

问题


i'm looking for help creating a sub-dataframe from an existing dataframe using a np.nansum-like function. I want to convert this table into a matrix of non-null column sums:

    dan ste bob
t1  na  2   na
t2  2   na  1
t3  2   1   na
t4  1   na  2
t5  na  1   2
t6  2   1   na
t7  1   na  2

For example, when 'dan' is not-null (t-2,3,4,6,7) the sum of 'ste' is 2 and 'bob' is 5. When 'ste' is not-null the sum of 'dan' is 4.

    dan ste bob
dan 0   2   5
ste 4   0   2
bob 4   1   0

Any ideas?

Thanks in advance!

I ended up using a modified version of matt's function below:

def nansum_matrix_create(df):
    rows = []
    for col in list(df.columns.values):

        col_sums = df[df[col] != 0].sum()
        rows.append(col_sums)

    return pd.DataFrame(rows, columns=df.columns, index=df.columns)

回答1:


Assuming your dataframe doesn't have large number of columns, this function should do what you want and be fairly performant. I have implemented this using for loop across columns so there may be a more performant / elegant solution out there.

import pandas as pd

# Initialise dataframe
df = {"dan":[pd.np.nan,2,2,1,pd.np.nan,2,1],
      "ste":[2,pd.np.nan,1,pd.np.nan,1,1,pd.np.nan],
      "bob":[pd.np.nan,1,pd.np.nan,2,2,pd.np.nan,2]}
df = pd.DataFrame(df)[["dan","ste","bob"]]

def matrix_create(df):
    rows = []
    for col in df.columns:
        subvals, index = [], []
        for subcol in df.columns:
            index.append(subcol)
            if subcol == col:
                subvals.append(0)
            else:
                subvals.append(df[~pd.isnull(df[col])][subcol].sum())

        rows.append(subvals)

    return pd.DataFrame(rows,columns=df.columns,index=index)

matrix_create(df)



回答2:


  1. Use pd.DataFrame.notnull to get where non-nulls are.
  2. Then use pd.DataFrame.dot to ge the crosstab.
  3. Finally, use np.eye to zero out the diagonal.

df.notnull().T.dot(df.fillna(0)) * (1 - np.eye(df.shape[1]))

     dan  ste  bob
dan  0.0  2.0  5.0
ste  4.0  0.0  2.0
bob  4.0  1.0  0.0

Note:
I used this to ensure my values were numeric.

df = df.apply(pd.to_numeric, errors='coerce')


来源:https://stackoverflow.com/questions/46869129/pandas-crosstab-matrix-dot-nansum

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!