How to merge/combine columns in pandas?

后端 未结 4 1998
日久生厌
日久生厌 2020-12-09 18:35

I have a (example-) dataframe with 4 columns:

data = {\'A\': [\'a\', \'b\', \'c\', \'d\', \'e\', \'f\'],
    \'B\': [42, 52, np.nan, np.nan, np.nan, np.nan],         


        
相关标签:
4条回答
  • 2020-12-09 19:03

    Use difference for columns names without A and then get sum or max:

    cols = df.columns.difference(['A'])
    df['E'] = df[cols].sum(axis=1).astype(int)
    # df['E'] = df[cols].max(axis=1).astype(int)
    df = df.drop(cols, axis=1)
    print (df)
       A   E
    0  a  42
    1  b  52
    2  c  31
    3  d   2
    4  e  62
    5  f  70
    

    If multiple values per rows:

    data = {'A': ['a', 'b', 'c', 'd', 'e', 'f'],
        'B': [42, 52, np.nan, np.nan, np.nan, np.nan],  
        'C': [np.nan, np.nan, 31, 2, np.nan, np.nan],
        'D': [10, np.nan, np.nan, np.nan, 62, 70]}
    df = pd.DataFrame(data, columns = ['A', 'B', 'C', 'D'])
    
    print (df)
       A     B     C     D
    0  a  42.0   NaN  10.0
    1  b  52.0   NaN   NaN
    2  c   NaN  31.0   NaN
    3  d   NaN   2.0   NaN
    4  e   NaN   NaN  62.0
    5  f   NaN   NaN  70.0
    
    cols = df.columns.difference(['A'])
    df['E'] = df[cols].apply(lambda x: ', '.join(x.dropna().astype(int).astype(str)), 1)
    df = df.drop(cols, axis=1)
    print (df)
       A       E
    0  a  42, 10
    1  b      52
    2  c      31
    3  d       2
    4  e      62
    5  f      70
    
    0 讨论(0)
  • 2020-12-09 19:06

    Option 1
    Using assign and drop

    In [644]: cols = ['B', 'C', 'D']
    
    In [645]: df.assign(E=df[cols].sum(1)).drop(cols, 1)
    Out[645]:
       A     E
    0  a  42.0
    1  b  52.0
    2  c  31.0
    3  d   2.0
    4  e  62.0
    5  f  70.0
    

    Option 2
    Using assignment and drop

    In [648]: df['E'] = df[cols].sum(1)
    
    In [649]: df = df.drop(cols, 1)
    
    In [650]: df
    Out[650]:
       A     E
    0  a  42.0
    1  b  52.0
    2  c  31.0
    3  d   2.0
    4  e  62.0
    5  f  70.0
    

    Option 3 Lately, I like the 3rd option.
    Using groupby

    In [660]: df.groupby(np.where(df.columns == 'A', 'A', 'E'), axis=1).first() #or sum max min
    Out[660]:
       A     E
    0  a  42.0
    1  b  52.0
    2  c  31.0
    3  d   2.0
    4  e  62.0
    5  f  70.0
    
    In [661]: df.columns == 'A'
    Out[661]: array([ True, False, False, False], dtype=bool)
    
    In [662]: np.where(df.columns == 'A', 'A', 'E')
    Out[662]:
    array(['A', 'E', 'E', 'E'],
          dtype='|S1')
    
    0 讨论(0)
  • 2020-12-09 19:09

    You can also use ffill with iloc:

    df['E'] = df.iloc[:, 1:].ffill(1).iloc[:, -1].astype(int)
    df = df.iloc[:, [0, -1]]
    
    print(df)
    
       A   E
    0  a  42
    1  b  52
    2  c  31
    3  d   2
    4  e  62
    5  f  70
    
    0 讨论(0)
  • 2020-12-09 19:27

    The question as written asks for merge/combine as opposed to sum, so posting this to help folks who find this answer looking for help on coalescing with combine_first, which can be a bit tricky.

    df2 = pd.concat([df["A"], 
                 df["B"].combine_first(df["C"]).combine_first(df["D"])], 
                axis=1)
    df2.rename(columns={"B":"E"}, inplace=True)
       A     E
    0  a  42.0
    1  b  52.0
    2  c  31.0
    3  d  2.0 
    4  e  62.0
    5  f  70.0
    

    What's so tricky about that? in this case there's no problem - but let's say you were pulling the B, C and D values from different dataframes, in which the a,b,c,d,e,f labels were present, but not necessarily in the same order. combine_first() aligns on the index, so you'd need to tack a set_index() on to each of your df references.

    df2 = pd.concat([df.set_index("A", drop=False)["A"], 
                 df.set_index("A")["B"]\
                 .combine_first(df.set_index("A")["C"])\
                 .combine_first(df.set_index("A")["D"]).astype(int)], 
                axis=1).reset_index(drop=True)
    df2.rename(columns={"B":"E"}, inplace=True)
    
       A   E
    0  a  42
    1  b  52
    2  c  31
    3  d  2 
    4  e  62
    5  f  70
    
    0 讨论(0)
提交回复
热议问题