问题
Is there a pandorable way to get only the index in dataframe slicing? In other words, is there a better way to write the following code:
df.loc[df['A'] >5].index
Thanks!
回答1:
Yes, better is filter only index values, not all DataFrame and then select index:
#filter index
df.index[df['A'] >5]
#filter DataFrame
df[df['A'] >5].index
Difference is in performance too:
np.random.seed(1245)
df = pd.DataFrame({'A':np.random.randint(10, size=1000)})
print (df)
In [40]: %timeit df.index[df['A'] >5]
208 µs ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [41]: %timeit df[df['A'] >5].index
428 µs ± 6.42 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [42]: %timeit df.loc[df['A'] >5].index
466 µs ± 40.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If performance is important use numpy - convert values of index and column by values to numpy array:
In [43]: %timeit df.index.values[df['A'] >5]
157 µs ± 8.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [44]: %timeit df.index.values[df['A'].values >5]
8.91 µs ± 196 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
来源:https://stackoverflow.com/questions/52531974/pandorable-way-to-return-index-in-dataframe-slicing