Pandas - Creating Difference Matrix from Data Frame

前端 未结 3 1924
臣服心动
臣服心动 2020-12-06 03:23

I\'m trying to create a matrix to show the differences between the rows in a Pandas data frame.

import pandas as pd

data = {\'Country\':[\'GB\',\'JP\',\'US\         


        
相关标签:
3条回答
  • 2020-12-06 03:37

    I try improve Divakar comment:

    a = np.column_stack([df['Country'], np.subtract.outer(*[-df.Values]*2)])
    
    df = pd.DataFrame(a, columns=['Country'] + df['Country'].tolist())
    print (df)
      Country    GB    JP    US
    0      GB     0 -30.7 -14.5
    1      JP  30.7     0  16.2
    2      US  14.5 -16.2     0
    
    0 讨论(0)
  • 2020-12-06 03:41

    This is a standard use case for numpy's broadcasting:

    df['Values'].values - df['Values'].values[:, None]
    Out: 
    array([[  0. , -30.7, -14.5],
           [ 30.7,   0. ,  16.2],
           [ 14.5, -16.2,   0. ]])
    

    We access the underlying numpy array with the values attribute and [:, None] introduces a new axis so the result is two dimensional.

    You can concat this with your original Series:

    arr = df['Values'].values - df['Values'].values[:, None]
    pd.concat((df['Country'], pd.DataFrame(arr, columns=df['Country'])), axis=1)
    Out: 
      Country    GB    JP    US
    0      GB   0.0 -30.7 -14.5
    1      JP  30.7   0.0  16.2
    2      US  14.5 -16.2   0.0
    

    The array can also be generated with the following, thanks to @Divakar:

    arr = np.subtract.outer(*[df.Values]*2).T
    

    Here we are calling .outer on the subtract ufunc and it applies it to all pair of its inputs.

    0 讨论(0)
  • 2020-12-06 03:47

    Option 1

    from itertools import product
    import pandas as pd
    DF=pd.DataFrame(list(product(df.Country, df.Country)), columns=['l1', 'l2'])
    df=df.set_index('Country')
    DF['v1']=DF.l1.map(df['Values'])
    DF['v2']=DF.l2.map(df['Values'])
    DF['DIFF']=DF['v2']-DF['v1']
    DF.pivot(index='l1', columns='l2', values='DIFF').fillna(0).rename_axis(None).rename_axis(None,1)
    Out[94]: 
          GB    JP    US
    GB   0.0 -30.7 -14.5
    JP  30.7   0.0  16.2
    US  14.5 -16.2   0.0
    

    Option 2 using apply

    A=df['Values'].apply(lambda x : df['Values']-x)
    A.columns=df.Country
    A['Country']=df.Country
    
    
    A
    Out[124]: 
    Country    GB    JP    US Country
    0         0.0 -30.7 -14.5      GB
    1        30.7   0.0  16.2      JP
    2        14.5 -16.2   0.0      US
    
    0 讨论(0)
提交回复
热议问题