Collapsing rows in a Pandas dataframe if all rows have only one value in their columns

前端 未结 2 993
太阳男子
太阳男子 2020-12-03 23:50

I have following DF

         col1  |  col2   | col3   | col4   | col5  | col6
    0    -     |   15.0  |  -     |  -     |   -   |  -
    1    -     |   -            


        
相关标签:
2条回答
  • 2020-12-04 00:00

    Option 0
    Super Simple

    pd.concat([pd.Series(df[c].dropna().values, name=c) for c in df], axis=1)
    
       col1  col2  col3   col4   col5 col6
    0  ABC1  15.0  24RA  Large  345.0   US
    

    Can we handle more than one value per column?
    Sure we can!

    df.loc[2, 'col3'] = 'Test'
    
       col1  col2  col3   col4   col5 col6
    0  ABC1  15.0  Test  Large  345.0   US
    1   NaN   NaN  24RA    NaN    NaN  NaN
    

    Option 1
    Generalized solution using np.where like a surgeon

    v = df.values
    i, j = np.where(np.isnan(v))
    
    s = pd.Series(v[i, j], df.columns[j])
    
    c = s.groupby(level=0).cumcount()
    s.index = [c, s.index]
    s.unstack(fill_value='-')  # <-- don't fill to get NaN
    
       col1  col2  col3   col4 col5 col6
    0  ABC1  15.0  24RA  Large  345   US
    

    df.loc[2, 'col3'] = 'Test'
    
    v = df.values
    i, j = np.where(np.isnan(v))
    
    s = pd.Series(v[i, j], df.columns[j])
    
    c = s.groupby(level=0).cumcount()
    s.index = [c, s.index]
    s.unstack(fill_value='-')  # <-- don't fill to get NaN
    
       col1  col2  col3   col4 col5 col6
    0  ABC1  15.0  Test  Large  345   US
    1     -     -  24RA      -    -    -
    

    Option 2
    mask to make nulls then stack to get rid of them

    Or we could have

    # This should work even if `'-'` are NaN
    # but you can skip the `.mask(df == '-')`
    s = df.mask(df == '-').stack().reset_index(0, drop=True)
    c = s.groupby(level=0).cumcount()
    s.index = [c, s.index]
    s.unstack(fill_value='-')
    
       col1  col2  col3   col4 col5 col6
    0  ABC1  15.0  Test  Large  345   US
    1     -     -  24RA      -    -    -
    
    0 讨论(0)
  • 2020-12-04 00:07

    You can use max, but you need to convert the null values in the string-valued columsn (which is a bit ugly unfortunately)

    >>> df = pd.DataFrame({'col1':[np.nan, "ABC1"], 'col2':[15.0, np.nan]})
    
    >>> df.apply(lambda c: c.fillna('') if c.dtype is np.dtype('O') else c).max()
    col1    ABC1
    col2      15
    dtype: object
    

    You could also you a combination of backfill and forwardfill to fill in the gaps, this could be useful if only want to apply this to some of your columns:

    >>> df.apply(lambda c: c.fillna(method='bfill').fillna(method='ffill'))
    
    0 讨论(0)
提交回复
热议问题