问题
I have two data, one with columns:
df1 =
ID As Hs Ts
A A_1 A_6 A_7
B B_1
C C_1 C10
D D_1
E E_1,E_2 E_5 E_4
F F_1,F_4
one with pair scores :
df2 =
ID1 1 ID2 2 SCORE
A A_1 B B_1 1
A A_6 B B_1 0.5
A A_7 B B_1 0.3
A A_1 C C_1 1
A A_6 C C_1 0.4
A A_7 C C_1 0.3
A A_1 C C_10 0.3
A A_6 C C_10 0.5
A A_7 C C_10 0.3
A A_1 D D_1 1
A A_6 D D_1 0.2
A A_7 D D_1 0.3
A A_1 E E_1 1
A A_6 E E_1 0.5
A A_7 E E_1 0.4
A A_1 E E_2 0.8
A A_6 E E_2 0.2
A A_7 E E_2 0.5
A A_1 E E_5 0.3
A A_6 E E_5 0.3
A A_7 E E_5 0.6
A A_1 E E_4 0.1
A A_6 E E_4 0.4
A A_7 E E_4 0.6
A A_1 F F_1 0.3
A A_6 F F_1 0.3
A A_7 F F_1 0.6
A A_1 F F_4 0.1
A A_6 F F_4 0.4
A A_7 F F_4 0.6
B B_1 C C_1 0.6
B B_1 C C_10 0.1
B B_1 D D_1 0.4
B B_1 E E_1 0.6
B B_1 E E_2 0.2
B B_1 E E_5 0.3
B B_1 E E_4 0.6
B B_1 F F_1 0.4
B B_1 F F_4 0.9
C C_1 D D_1 0.8
C C_1 E E_1 0.6
C C_1 E E_2 0.4
C C_1 E E_4 0.3
C C_1 E E_5 0.2
C C_1 F F_1 0.3
C C_1 F F_4 0.4
C C_10 D D_1 0.2
C C_10 E E_1 0.3
C C_10 E E_2 0.4
C C_10 E E_5 0.3
C C_10 E E_4 0.4
C C_10 F F_1 0.3
C C_10 F F_4 0.2
D D_1 F F_4 1
D D_1 E E_2 0.5
D D_1 E E_5 0.3
D D_1 E E_4 0.2
D D_1 F F_1 0.5
D D_1 F F_4 0.2
E E_1 F F_1 0.9
E E_1 F F_4 0.2
E E_2 F F_1 0.3
E E_2 F F_4 0.2
E E_5 F F_1 0.5
E E_5 F F_4 0.3
E E_4 F F_1 0.6
E E_4 F F_4 0.3
my desired matrix output as :
As Hs Ts
A_1 B_1 C_1 D_1 E_1 E_2 A_6 E_5 F_1 F_4 A_7 C_10 E_4
As A_1 1 1 1 1 0.8 0.3 0.3 0.1 0.3 0.1
B_1 1 0.6 0.4 0.6 0.2 0.5 0.3 0.4 0.9 0.3 0.1 0.6
C_1 1 0.6 0.8 0.6 0.4 0.4 0.2 0.3 0.4 0.3 0.3
D_1 1 0.4 0.8 1 0.5 0.2 0.3 0.5 0.2 0.3 0.2 0.2
E_1 1 0.6 0.6 1 0.5 0.2 0.4 0.3
E_2 0.8 0.2 0.4 1 0.2 0.2 0.5 0.4
Hs A_6 0.5 0.4 0.2 0.5 0.2 0.3 0.3 0.4 0.5 0.4
E_5 0.3 0.3 0.2 0.3 0.3 0.6 0.3
F_1 0.3 0.4 0.3 0.5 0.9 0.3 0.3 0.6 0.3 0.6
F_4 0.1 0.9 0.4 0.2 0.2 0.2 0.4 0.6 0.2 0.3
Ts A_7 0.3 0.3 0.3 0.4 0.5 0.6 0.6 0.6 0.3 0.6
C_10 0.3 0.1 0.5 0.3 0.4
E_4 0.1 0.6 0.3 0.2 0.4 0.6 0.4
Note that pair have no score should be empty in the output matrix.
Should i try pd.crosstab ? df.pivot_table ? groupby and unstack?
How can I achieve the desired output? Any suggestion would be appreciated. Note that pair have no score should be empty in the output matrix. Thank you
回答1:
here is an example of solution, the difficulty is to sort data following what you want..: i have selected another little example
import pandas as pd
import numpy as np
idx ="""
grp id
As A_1
As B_1
As C_1
As D_1
As E_1
As E_2
Hs A_6
Hs E_5
Hs F_1
Hs F_4
Ts A_7
Ts C_10
Ts E_4
"""
data="""
ID1 1 ID2 2 SCORE
A A_1 B B_1 1
A F_1 B B_1 1
A A_6 B E_2 0.5
A A_7 B B_1 0.3
A A_1 C C_1 1
A A_6 C C_1 0.4
A A_7 C E_5 0.3
A A_1 C C_10 0.3
A A_6 C C_10 0.5
A A_7 C C_10 0.3
A A_1 D D_1 1
A A_6 D D_1 0.2
A A_7 D D_1 0.3
A A_7 E E_4 0.6
A A_1 F E_1 0.3
A E_5 F F_1 0.3
A A_7 F F_1 0.6
A A_1 F F_4 0.1
A A_6 F F_4 0.4
"""
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')
ix = pd.read_csv(pd.compat.StringIO(idx), sep='\s+')
df.drop(['ID1', 'ID2'], axis=1, inplace=True)
df1 = df.copy(deep=True)
#i append (col 1, col 2) from df1 to (col 2, col 1) to df
#i could build my crosstab after with groupby
df1.columns = ['2', '1', 'SCORE']
df = df.append(df1, sort=False)
#i link the groupname As,Hs,Ts to the name of player and i concatenate the information
df = pd.merge(df, ix, left_on='1', right_on='id')
df['1'] = '(' + df['grp'].map(str) + ', ' + df['1'].map(str) + ')'
df.drop(['grp', 'id'],axis=1, inplace=True)
df = pd.merge(df, ix, left_on='2', right_on='id')
df['2'] = '(' + df['grp'].map(str) + ', ' + df['2'].map(str) + ')'
df.drop(['grp', 'id'],axis=1, inplace=True)
#i groupby player and i unstack to build the crosstab
df = df.groupby([ '1','2']).SCORE.max().unstack().fillna(' ')
print(df)
result:
2 (As, A_1) (As, B_1) (As, C_1) ... (Ts, A_7) (Ts, C_10) (Ts, E_4)
1 ...
(As, A_1) 1 1 ... 0.3
(As, B_1) 1 ... 0.3
(As, C_1) 1 ...
(As, D_1) 1 ... 0.3
(As, E_1) 0.3 ...
(As, E_2) ...
(Hs, A_6) 0.4 ... 0.5
(Hs, E_5) ... 0.3
(Hs, F_1) 1 ... 0.6
(Hs, F_4) 0.1 ...
(Ts, A_7) 0.3 ... 0.3 0.6
(Ts, C_10) 0.3 ... 0.3
(Ts, E_4) ... 0.6
Another solution using multiindex and header for columns:
df = pd.read_csv(pd.compat.StringIO(data), sep='\s+')
ix = pd.read_csv(pd.compat.StringIO(idx), sep='\s+')
df.drop(['ID1', 'ID2'], axis=1, inplace=True)
df1 = df.copy(deep=True)
df1.columns = ['2', '1', 'SCORE']
As = ['A_1', 'B_1', 'C_1' , 'D_1', 'E_1', 'E_2']
Hs = ['A_6', 'E_5', 'F_1', 'F_4']
Ts = ['A_7', 'C_10', 'E_4']
df = df.append(df1, sort=False)
df = pd.merge(df, ix, left_on='1', right_on='id')
df.drop(['id'], axis=1, inplace=True)
df = pd.merge(df, ix, left_on='2', right_on='id')
df.drop(['id'],axis=1, inplace=True)
df = df.groupby(['grp_x', '1','2']).SCORE.max().unstack().fillna(' ')
df = df[As + Hs + Ts]
header = ['As', 'As', 'As', 'As', 'As', 'As', 'Hs', 'Hs', 'Hs', 'Hs', 'Ts', 'Ts', 'Ts']
df.columns = pd.MultiIndex.from_tuples(list(zip(header, df.columns)))
print(df)
result:
As Hs Ts
A_1 B_1 C_1 D_1 E_1 E_2 A_6 E_5 F_1 F_4 A_7 C_10 E_4
grp_x 1
As A_1 1 1 1 0.3 0.1 0.3
B_1 1 1 0.3
C_1 1 0.4
D_1 1 0.2 0.3
E_1 0.3
E_2 0.5
Hs A_6 0.4 0.2 0.5 0.4 0.5
E_5 0.3 0.3
F_1 1 0.3 0.6
F_4 0.1 0.4
Ts A_7 0.3 0.3 0.3 0.6 0.3 0.6
C_10 0.3 0.5 0.3
E_4 0.6
if i use your sample the result:
As Hs Ts
A_1 B_1 C_1 D_1 E_1 E_2 A_6 E_5 F_1 F_4 A_7 C_10 E_4
grp_x 1
As A_1 1 1 1 1 0.8 0.3 0.3 0.1 0.3 0.1
B_1 1 0.6 0.4 0.6 0.2 0.5 0.3 0.4 0.9 0.3 0.1 0.6
C_1 1 0.6 0.8 0.6 0.4 0.4 0.2 0.3 0.4 0.3 0.3
D_1 1 0.4 0.8 0.5 0.2 0.3 0.5 1 0.3 0.2 0.2
E_1 1 0.6 0.6 0.5 0.9 0.2 0.4 0.3
E_2 0.8 0.2 0.4 0.5 0.2 0.3 0.2 0.5 0.4
Hs A_6 0.5 0.4 0.2 0.5 0.2 0.3 0.3 0.4 0.5 0.4
E_5 0.3 0.3 0.2 0.3 0.3 0.5 0.3 0.6 0.3
F_1 0.3 0.4 0.3 0.5 0.9 0.3 0.3 0.5 0.6 0.3 0.6
F_4 0.1 0.9 0.4 1 0.2 0.2 0.4 0.3 0.6 0.2 0.3
Ts A_7 0.3 0.3 0.3 0.4 0.5 0.6 0.6 0.6 0.3 0.6
C_10 0.3 0.1 0.2 0.3 0.4 0.5 0.3 0.3 0.2 0.3 0.4
E_4 0.1 0.6 0.3 0.2 0.4 0.6 0.3 0.6 0.4
来源:https://stackoverflow.com/questions/56001138/create-a-matrix-with-two-dataframe-pandas