Efficient way to unnest (explode) multiple list columns in a pandas DataFrame

前端 未结 4 562
天涯浪人
天涯浪人 2020-11-27 14:43

I am reading multiple JSON objects into one DataFrame. The problem is that some of the columns are lists. Also, the data is very big and because of that I cannot use the ava

4条回答
  •  旧巷少年郎
    2020-11-27 15:24

    def explode(df, lst_cols, fill_value=''):
        # make sure `lst_cols` is a list
        if lst_cols and not isinstance(lst_cols, list):
            lst_cols = [lst_cols]
        # all columns except `lst_cols`
        idx_cols = df.columns.difference(lst_cols)
    
        # calculate lengths of lists
        lens = df[lst_cols[0]].str.len()
    
        if (lens > 0).all():
            # ALL lists in cells aren't empty
            return pd.DataFrame({
                col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
                for col in idx_cols
            }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
              .loc[:, df.columns]
        else:
            # at least one list in cells is empty
            return pd.DataFrame({
                col:np.repeat(df[col].values, df[lst_cols[0]].str.len())
                for col in idx_cols
            }).assign(**{col:np.concatenate(df[col].values) for col in lst_cols}) \
              .append(df.loc[lens==0, idx_cols]).fillna(fill_value) \
              .loc[:, df.columns]
    

    Usage:

    In [82]: explode(df, lst_cols=list('BCDE'))
    Out[82]:
        A   B   C   D   E
    0  x1  v1  c1  d1  e1
    1  x1  v2  c2  d2  e2
    2  x2  v3  c3  d3  e3
    3  x2  v4  c4  d4  e4
    4  x3  v5  c5  d5  e5
    5  x3  v6  c6  d6  e6
    6  x4  v7  c7  d7  e7
    7  x4  v8  c8  d8  e8
    

提交回复
热议问题