“Pandorable” way to return index in dataframe slicing

问题

Is there a pandorable way to get only the index in dataframe slicing? In other words, is there a better way to write the following code:

df.loc[df['A'] >5].index

Thanks!

回答1:

Yes, better is filter only index values, not all DataFrame and then select index:

#filter index
df.index[df['A'] >5]

#filter DataFrame
df[df['A'] >5].index

Difference is in performance too:

np.random.seed(1245)
df = pd.DataFrame({'A':np.random.randint(10, size=1000)})
print (df)

In [40]: %timeit df.index[df['A'] >5]
208 µs ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [41]: %timeit df[df['A'] >5].index
428 µs ± 6.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [42]: %timeit df.loc[df['A'] >5].index
466 µs ± 40.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If performance is important use numpy - convert values of index and column by values to numpy array:

In [43]: %timeit df.index.values[df['A'] >5]
157 µs ± 8.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [44]: %timeit df.index.values[df['A'].values >5]
8.91 µs ± 196 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

来源：https://stackoverflow.com/questions/52531974/pandorable-way-to-return-index-in-dataframe-slicing

标签

python

pandas

indexing

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!