Finding the index of the first element (e.g “True”) from a series/column

孤街醉人 提交于 2021-01-21 07:13:51

问题


How do I find the index of an element (e.g "True") in a series or a column?

For example I have a column, where I want to identify the first instance where an event occur. So I write it as

Variable = df["Force"] < event

This then creates a boolen series of Data where it is False, until the first instance it becomes True. How then do I find the index of data point?

Is there are better way?


回答1:


Use idxmax to find the first instance of the maximum value. In this case, True is the maximum value.

df['Force'].lt(event).idxmax()

Consider the sample df:

df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))
df

   Force
a      5
b      4
c      3
d      2
e      1

The first instance of Force being less than 3 is at index 'd'.

df['Force'].lt(3).idxmax()
'd'

Be aware that if no value for Force is less than 3, then the maximum will be False and the first instance will be the first one.

Also consider the alternative argmax

df.Force.lt(3).values.argmax()
3

It returns the position of the first instance of maximal value. You can then use this to find the corresponding index value:

df.index[df.Force.lt(3).values.argmax()]
'd'

Also, in the future, argmax will be a Series method.




回答2:


You can also try first_valid_index with where.

df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
df.Force.where(df.Force < 3).first_valid_index()
3

where will replace the part that does not meet the condition with np.nan by default. Then, we find the first valid index out of the series.


Or this: select a subset of the item that you are interested in, here Variable == 1. Then find the first item in its index.

df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
v = (df["Force"] < 3)
v[v == 1].index[0]

Bonus: if you need the index of first appearance of many kinds of items, you can use drop_duplicates.

df = pd.DataFrame([["yello"], ["yello"], ["blue"], ["red"],  ["blue"], ["red"]], columns=["Force"])  
df.Force.drop_duplicates().reset_index()
    index   Force
0   0       yello
1   2       blue
2   3       red

Some more work...

df.Force.drop_duplicates().reset_index().set_index("Force").to_dict()["index"]
{'blue': 2, 'red': 3, 'yello': 0}



回答3:


Below is a non-pandas solution which I find easy to adapt:

import pandas as pd

df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))

next(idx for idx, x in zip(df.index, df.Force) if x < 3)  # d

It works by iterating to the first result of a generator expression.

Pandas appears to perform poorly in comparison:

df = pd.DataFrame(dict(Force=np.random.randint(0, 100000, 100000)))

n = 99900

%timeit df['Force'].lt(n).idxmin()
# 1000 loops, best of 3: 1.57 ms per loop

%timeit df.Force.where(df.Force > n).first_valid_index()
# 100 loops, best of 3: 1.61 ms per loop

%timeit next(idx for idx, x in zip(df.index, df.Force) if x > n)
# 10000 loops, best of 3: 100 µs per loop


来源:https://stackoverflow.com/questions/48634271/finding-the-index-of-the-first-element-e-g-true-from-a-series-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!