问题
I've a dataframe which looks like this
some feature another feature label
sample
0 ... ... ...
and I'd like to get a dataframe with multiindexed columns like this
features label
sample some another
0 ... ... ...
From the API it's not clear to me how to use from_arrays(), from_product(), from_tuples() or from_frame() correctly. The solution shall not depend on string parsing of the feature columns (some feature, another feature). The last column for the label is the last column and it's column name label may be used. How can I get want I want?
回答1:
From the API it's not clear to me how to use
from_arrays(),from_product(),from_tuples()orfrom_frame()correctly.
It is mainly used, if generate new DataFrame with MultiIndex independent of original columns names.
So it means if need completely new MultiIndex, e.g. by lists or arrays:
a = ['a','a','b']
b = ['x','y','z']
df.columns = pd.MultiIndex.from_arrays([a,b])
print (df)
a b
x y z
sample
0 2 3 5
1 4 5 7
EDIT1: If want set all columns to MultiIndex all columns same way without last one:
a = ['parent'] * (len(df.columns) - 1) + ['label']
b = df.columns[:-1].tolist() + ['val']
df.columns = pd.MultiIndex.from_arrays([a,b])
print (df)
parent label
feature a feature b val
sample
0 2 3 5
1 4 5 7
It is possible by split, but if some column(s) without separator get NaNs for second level, because is not possible combinations MultiIndex and not MultiIndex columns (actaully yes, but get tuples from MultiIndex columns):
print (df)
feature_a feature_b label
sample
0 2 3 5
1 4 5 7
df.columns = df.columns.str.split(expand=True)
print (df)
feature label
a b NaN
sample
0 2 3 5
1 4 5 7
So better is convert all columns without separator to Index/MultiIndex first by DataFrame.set_index:
df = df.set_index('label')
df.columns = df.columns.str.split(expand=True)
print (df)
feature
a b
label
5 2 3
7 4 5
For prevent original index is used append=True parameter:
df = df.set_index('label', append=True)
df.columns = df.columns.str.split(expand=True)
print (df)
feature
a b
sample label
0 5 2 3
1 7 4 5
来源:https://stackoverflow.com/questions/61229699/how-can-i-summarize-several-pandas-dataframe-columns-into-a-parent-column-name