Pandas - Creating Difference Matrix from Data Frame

前端 未结 3 1932
臣服心动
臣服心动 2020-12-06 03:23

I\'m trying to create a matrix to show the differences between the rows in a Pandas data frame.

import pandas as pd

data = {\'Country\':[\'GB\',\'JP\',\'US\         


        
3条回答
  •  猫巷女王i
    2020-12-06 03:41

    This is a standard use case for numpy's broadcasting:

    df['Values'].values - df['Values'].values[:, None]
    Out: 
    array([[  0. , -30.7, -14.5],
           [ 30.7,   0. ,  16.2],
           [ 14.5, -16.2,   0. ]])
    

    We access the underlying numpy array with the values attribute and [:, None] introduces a new axis so the result is two dimensional.

    You can concat this with your original Series:

    arr = df['Values'].values - df['Values'].values[:, None]
    pd.concat((df['Country'], pd.DataFrame(arr, columns=df['Country'])), axis=1)
    Out: 
      Country    GB    JP    US
    0      GB   0.0 -30.7 -14.5
    1      JP  30.7   0.0  16.2
    2      US  14.5 -16.2   0.0
    

    The array can also be generated with the following, thanks to @Divakar:

    arr = np.subtract.outer(*[df.Values]*2).T
    

    Here we are calling .outer on the subtract ufunc and it applies it to all pair of its inputs.

提交回复
热议问题