问题
Given two arrays:
x
[('010_628', '2543677'), ('010_228', '2543677'), ('015_634', '2543677')]
y
array([['me', 10228955],
['me', 10228955],
['me', 10228955]], dtype=object)
Currently, this code gets me a dataframe with a flat index of tuples:
df = pd.DataFrame(x, index=y, columns=['pm_code', 'sec_pm'])
df
pm_code sec_pm
(me, 10228955) 010_628 2543677
(me, 10228955) 010_228 2543677
(me, 10228955) 015_634 2543677
How can I instead create a MultiIndex
dataframe that looks like this?
pm_code sec_pm
state site_no
me 10228955 010_628 2543677
010_228 2543677
015_634 2543677
I've tried using pd.MultiIndex.from_tuples
but I'm not able to get this right. Thanks for the help.
Appendix: Performance Comparisons
Small
# unutbu #1
%timeit pd.DataFrame(x, index=pd.MultiIndex.from_arrays(y.T), columns=['pm_code', 'sec_pm'])
1000 loops, best of 3: 1.25 ms per loop
# unutbu #2
%timeit pd.DataFrame(x, index=pd.MultiIndex.from_tuples(y.tolist()), columns=['pm_code', 'sec_pm'])
1000 loops, best of 3: 1.47 ms per loop
# piRSquared
%timeit pd.DataFrame(x, index=y.T.tolist(), columns=['pm_code', 'sec_pm'])
1000 loops, best of 3: 1.41 ms per loop
# Andrew L
%timeit pd.DataFrame(x, index=[y[:,0], y[:,1]], columns=['pm_code', 'sec_pm'])
1000 loops, best of 3: 1.29 ms per loop
Large
x2 = np.repeat(x, 10000, 0)
y2 = np.repeat(x, 10000, 0)
# unutbu #1
%timeit pd.DataFrame(x2, index=pd.MultiIndex.from_arrays(y2.T), columns=['pm_code', 'sec_pm'])
100 loops, best of 3: 17.3 ms per loop
# unutbu #2
%timeit pd.DataFrame(x2, index=pd.MultiIndex.from_tuples(y2.tolist()), columns=['pm_code', 'sec_pm'])
10 loops, best of 3: 30.5 ms per loop
# piRSquared
%timeit pd.DataFrame(x2, index=y2.T.tolist(), columns=['pm_code', 'sec_pm'])
10 loops, best of 3: 37.2 ms per loop
# Andrew L
%timeit pd.DataFrame(x2, index=[y2[:,0], y2[:,1]], columns=['pm_code', 'sec_pm'])
100 loops, best of 3: 22 ms per loop
Data from this question.
回答1:
You could use pd.MultiIndex.from_arrays(y.T)
:
In [53]: pd.DataFrame(x, index=pd.MultiIndex.from_arrays(y.T), columns=['pm_code', 'sec_pm'])
Out[53]:
pm_code sec_pm
me 10228955 010_628 2543677
10228955 010_228 2543677
10228955 015_634 2543677
or pd.MultiIndex.from_tuples(y.tolist())
:
In [54]: pd.DataFrame(x, index=pd.MultiIndex.from_tuples(y.tolist()), columns=['pm_code', 'sec_pm'])
Out[54]:
pm_code sec_pm
me 10228955 010_628 2543677
10228955 010_228 2543677
10228955 015_634 2543677
回答2:
You can also slice your arrays and pass to index
:
df = pd.DataFrame(x, index=[y[:,0], y[:,1]], columns=['pm_code', 'sec_pm'])
df
pm_code sec_pm
me 10228955 010_628 2543677
10228955 010_228 2543677
10228955 015_634 2543677
回答3:
Option 1
If you pass a list of arrays like things, the constructor knows what to do with it.
pd.DataFrame(x, index=y.T.tolist(), columns=['pm_code', 'sec_pm'])
pm_code sec_pm
me 10228955 010_628 2543677
10228955 010_228 2543677
10228955 015_634 2543677
来源:https://stackoverflow.com/questions/45946507/create-multiindexed-dataframe-through-constructor