Converting a pandas crosstab into a stacked dataframe (a regular table)

半世苍凉 提交于 2020-01-02 09:41:27

问题


Given a pandas crosstab, how do you convert that into a stacked dataframe?

Assume you have a stacked dataframe. First we convert it into a crosstab. Now I would like to revert back to the original stacked dataframe. I searched a problem statement that addresses this requirement, but could not find any that hits bang on. In case I have missed any, please leave a note to it in the comment section.

I would like to document the best practice here. So, thank you for your support.

I know that pandas.DataFrame.stack() would be the best approach. But one needs to be careful of the the "level" stacking is applied to.

Input: Crosstab:


    Label   a   b   c   d   r
    ID                  
    1       0   1   0   0   0
    2       1   1   0   1   1
    3       1   0   0   0   1
    4       1   0   0   1   0
    6       1   0   0   0   0
    7       0   0   1   0   0
    8       1   0   1   0   0
    9       0   1   0   0   0

Output: Stacked DataFrame:


        ID  Label
    0   1   b
    1   2   a
    2   2   b
    3   2   d
    4   2   r
    5   3   a
    6   3   r
    7   4   a
    8   4   d
    9   6   a
    10  7   c
    11  8   a
    12  8   c
    13  9   b

Step-by-step Explanation:

First, let's make a function that would create our data. Note that it randomly generates the stacked dataframe, and so, the final output may differ from what I have given below.

Helper Function: Make the Stacked And Crosstab DataFrames

import numpy as np
import pandas as pd

# Make stacked dataframe
def _create_df():
    """
    This dataframe will be used to create a crosstab
    """
    B = np.array(list('abracadabra'))
    A = np.arange(len(B))
    AB = list()
    for i in range(20):
        a = np.random.randint(1,10)
        b = np.random.randint(1,10)
        AB += [(a,b)]
    AB = np.unique(np.array(AB), axis=0)
    AB = np.unique(np.array(list(zip(A[AB[:,0]], B[AB[:,1]]))), axis=0)
    AB_df = pd.DataFrame({'ID': AB[:,0], 'Label': AB[:,1]})
    return AB_df

original_stacked_df = _create_df()

# Make crosstab
crosstab_df = pd.crosstab(original_stacked_df['ID'], 
                          original_stacked_df['Label']).reindex()

What to expect?

You would expect a function to regenerate the stacked dataframe from the crosstab. I would provide my own solution to this in the answer section. If you could suggest something better that would be great.

Other References:

  • Closest stackoverflow discussion: pandas stacking a dataframe
  • Misleading stackoverflow question-topic: change pandas crossstab dataframe into plain table format:

回答1:


You can just do stack

df[df.astype(bool)].stack().reset_index().drop(0,1)



回答2:


The following produces the desired outcome.

def crosstab2stacked(crosstab):
    stacked = crosstab.stack(dropna=True).reset_index()
    stacked = stacked[stacked.replace(0,np.nan)[0].notnull()].drop(columns=[0])
    return stacked

# Make original dataframe
original_stacked_df = _create_df()
# Make crosstab dataframe
crosstab_df = pd.crosstab(original_stacked_df['ID'], 
                          original_stacked_df['Label']).reindex()
# Recontruct stacked dataframe
recon_stacked_df = crosstab2stacked(crosstab = crosstab_df)

Check if original == reconstructed:

np.alltrue(original_stacked_df == recon_stacked_df)

Output: True



来源:https://stackoverflow.com/questions/57583269/converting-a-pandas-crosstab-into-a-stacked-dataframe-a-regular-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!