Pandas: Replacement for .ix

牧云@^-^@ 提交于 2020-01-10 04:53:26

问题


Given the update to pandas 0.20.0 and the deprecation of .ix, I am wondering what the most efficient way to get the same result using the remaining .loc and .iloc. I just answered this question, but the second option (not using .ix) seems inefficient and verbose.

Snippet:

print df.iloc[df.loc[df['cap'].astype(float) > 35].index, :-1]

Is this the proper way to go when using both conditional and index position filtering?


回答1:


You can stay in the world of a single loc by getting at the index values you need by slicing that particular index with positions.

df.loc[
    df['cap'].astype(float) > 35,
    df.columns[:-1]
]



回答2:


Generally, you would prefer to avoid chained indexing in pandas (though, strictly speaking, you're actually using two different indexing methods). You can't modify your dataframe this way (details in the docs), and the docs cite performance as another reason (indexing once vs. twice).

For the latter, it's usually insignificant (or rather, unlikely to be a bottleneck in your code), and actually seems to not be the case (at least in the following example):

df = pd.DataFrame(np.random.uniform(size=(100000,10)),columns = list('abcdefghij'))
# Get columns number 2:5 where value in 'a' is greater than 0.5 
# (i.e. Boolean mask along axis 0, position slice of axis 1)

# Deprecated .ix method
%timeit df.ix[df['a'] > 0.5,2:5]
100 loops, best of 3: 2.14 ms per loop

# Boolean, then position
%timeit df.loc[df['a'] > 0.5,].iloc[:,2:5]
100 loops, best of 3: 2.14 ms per loop

# Position, then Boolean
%timeit df.iloc[:,2:5].loc[df['a'] > 0.5,]
1000 loops, best of 3: 1.75 ms per loop

# .loc
%timeit df.loc[df['a'] > 0.5, df.columns[2:5]]
100 loops, best of 3: 2.64 ms per loop

# .iloc
%timeit df.iloc[np.where(df['a'] > 0.5)[0],2:5]
100 loops, best of 3: 9.91 ms per loop

Bottom line: If you really want to avoid .ix, and you're not intending to modify values in your dataframe, just go with chained indexing. On the other hand (the 'proper' but arguably messier way), if you do need to modify values, either do .iloc with np.where() or .loc with integer slices of df.index or df.columns.




回答3:


How about breaking this into a two-step indexing:

df[df['cap'].astype(float) > 35].iloc[:,:-1]

or even:

df[df['cap'].astype(float) > 35].drop('cap',1)


来源:https://stackoverflow.com/questions/43838999/pandas-replacement-for-ix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!