I\'m trying to create a matrix to show the differences between the rows in a Pandas data frame.
import pandas as pd
data = {\'Country\':[\'GB\',\'JP\',\'US\
I try improve Divakar comment:
a = np.column_stack([df['Country'], np.subtract.outer(*[-df.Values]*2)])
df = pd.DataFrame(a, columns=['Country'] + df['Country'].tolist())
print (df)
Country GB JP US
0 GB 0 -30.7 -14.5
1 JP 30.7 0 16.2
2 US 14.5 -16.2 0
This is a standard use case for numpy's broadcasting:
df['Values'].values - df['Values'].values[:, None]
Out:
array([[ 0. , -30.7, -14.5],
[ 30.7, 0. , 16.2],
[ 14.5, -16.2, 0. ]])
We access the underlying numpy array with the values attribute and [:, None]
introduces a new axis so the result is two dimensional.
You can concat this with your original Series:
arr = df['Values'].values - df['Values'].values[:, None]
pd.concat((df['Country'], pd.DataFrame(arr, columns=df['Country'])), axis=1)
Out:
Country GB JP US
0 GB 0.0 -30.7 -14.5
1 JP 30.7 0.0 16.2
2 US 14.5 -16.2 0.0
The array can also be generated with the following, thanks to @Divakar:
arr = np.subtract.outer(*[df.Values]*2).T
Here we are calling .outer on the subtract
ufunc and it applies it to all pair of its inputs.
Option 1
from itertools import product
import pandas as pd
DF=pd.DataFrame(list(product(df.Country, df.Country)), columns=['l1', 'l2'])
df=df.set_index('Country')
DF['v1']=DF.l1.map(df['Values'])
DF['v2']=DF.l2.map(df['Values'])
DF['DIFF']=DF['v2']-DF['v1']
DF.pivot(index='l1', columns='l2', values='DIFF').fillna(0).rename_axis(None).rename_axis(None,1)
Out[94]:
GB JP US
GB 0.0 -30.7 -14.5
JP 30.7 0.0 16.2
US 14.5 -16.2 0.0
Option 2
using apply
A=df['Values'].apply(lambda x : df['Values']-x)
A.columns=df.Country
A['Country']=df.Country
A
Out[124]:
Country GB JP US Country
0 0.0 -30.7 -14.5 GB
1 30.7 0.0 16.2 JP
2 14.5 -16.2 0.0 US