问题
I have a multiindex dataframe like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ind1': list('aaaaaaaaabbbbbbbbb'),
'ind2': list('cccdddeeecccdddeee'),
'ind3': list(range(3))*6,
'val1': list(range(100, 118)),
'val2': list(range(70, 88))})
df_mult = df.set_index(['ind1', 'ind2', 'ind3'])
val1 val2
ind1 ind2 ind3
a c 0 100 70
1 101 71
2 102 72
d 0 103 73
1 104 74
2 105 75
e 0 106 76
1 107 77
2 108 78
b c 0 109 79
1 110 80
2 111 81
d 0 112 82
1 113 83
2 114 84
e 0 115 85
1 116 86
2 117 87
I can now select a subset of it using .loc
like this
df_subs = df_mult.loc[pd.IndexSlice['a', ['c', 'd'], :], :]
which gives the expected
val1 val2
ind1 ind2 ind3
a c 0 100 70
1 101 71
2 102 72
d 0 103 73
1 104 74
2 105 75
When I print
df_subs.index
I get
MultiIndex(levels=[[u'a', u'b'], [u'c', u'd', u'e'], [0, 1, 2]],
labels=[[0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
names=[u'ind1', u'ind2', u'ind3'])
Why is there still b
in level 0 and not just a
?
This could become an issue if I want to use the elements of the index for something else. Then
df_subs.index.levels[0]
gives me
Index([u'a', u'b'], dtype='object', name=u'ind1')
however,
df_subs.index.get_level_values('ind1').unique()
gives me
Index([u'a'], dtype='object', name=u'ind1')
which looks inconsistent to me.
Is this a bug or intended behavior?
回答1:
There's a discussion on GitHub surrounding this behavior here.
In short, the levels you see are not computed from the values in the MultiIndex that you actually observe - unobserved levels will persist through indexing after you first set up the MultiIndex. This allows the level indexes to be shared between all the views and copies of some MultiIndex, which is nice memory-wise - i.e., df_mult
and df_subs
are sharing the same underlying level indexes in memory.
If you have a case for which you want to recompute the levels to get rid of the unused ones and create a new MultiIndex, you can use MultiIndex.remove_unused_levels().
In your case
>>> df_subs.index.remove_unused_levels().levels[0]
Index(['a'], dtype='object', name='ind1')
来源:https://stackoverflow.com/questions/46624457/why-do-i-see-all-original-index-elements-in-a-sliced-dataframe