I think there is a conceptual bug in the way multiIndex is created on a dataframe slice. Consider the following code:
import cufflinks as cf
df=cf.datagen.li
You need remove_unused_levels what is new functionality in pandas 0.20.0
, you can also check docs:
new_df.columns.remove_unused_levels()
Sample:
np.random.seed(23)
cols = pd.MultiIndex.from_tuples([('Iter1','a'), ('Iter1','b'),
('Iter2','c'), ('Iter2','d'),
('Iter3','e'), ('Iter3','f')])
idx = pd.date_range('2015-01-01', periods=5)
df = pd.DataFrame(np.random.rand(5,6), columns=cols, index=idx)
print (df)
Iter1 Iter2 Iter3
a b c d e f
2015-01-01 0.517298 0.946963 0.765460 0.282396 0.221045 0.686222
2015-01-02 0.167139 0.392442 0.618052 0.411930 0.002465 0.884032
2015-01-03 0.884948 0.300410 0.589582 0.978427 0.845094 0.065075
2015-01-04 0.294744 0.287934 0.822466 0.626183 0.110478 0.000529
2015-01-05 0.942166 0.141501 0.421597 0.346489 0.869785 0.428602
new_df = df[['Iter1','Iter2']].copy()
print (new_df)
Iter1 Iter2
a b c d
2015-01-01 0.517298 0.946963 0.765460 0.282396
2015-01-02 0.167139 0.392442 0.618052 0.411930
2015-01-03 0.884948 0.300410 0.589582 0.978427
2015-01-04 0.294744 0.287934 0.822466 0.626183
2015-01-05 0.942166 0.141501 0.421597 0.346489
print (new_df.columns)
MultiIndex(levels=[['Iter1', 'Iter2', 'Iter3'], ['a', 'b', 'c', 'd', 'e', 'f']],
labels=[[0, 0, 1, 1], [0, 1, 2, 3]])
print (new_df.columns.remove_unused_levels())
MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],
labels=[[0, 0, 1, 1], [0, 1, 2, 3]])
new_df.columns = new_df.columns.remove_unused_levels()
print (new_df.columns)
MultiIndex(levels=[['Iter1', 'Iter2'], ['a', 'b', 'c', 'd']],
labels=[[0, 0, 1, 1], [0, 1, 2, 3]])