问题
How do I find the index of an element (e.g "True") in a series or a column?
For example I have a column, where I want to identify the first instance where an event occur. So I write it as
Variable = df["Force"] < event
This then creates a boolen series of Data where it is False, until the first instance it becomes True. How then do I find the index of data point?
Is there are better way?
回答1:
Use idxmax to find the first instance of the maximum value. In this case, True is the maximum value.
df['Force'].lt(event).idxmax()
Consider the sample df:
df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))
df
Force
a 5
b 4
c 3
d 2
e 1
The first instance of Force being less than 3 is at index 'd'.
df['Force'].lt(3).idxmax()
'd'
Be aware that if no value for Force is less than 3, then the maximum will be False and the first instance will be the first one.
Also consider the alternative argmax
df.Force.lt(3).values.argmax()
3
It returns the position of the first instance of maximal value. You can then use this to find the corresponding index value:
df.index[df.Force.lt(3).values.argmax()]
'd'
Also, in the future, argmax will be a Series method.
回答2:
You can also try first_valid_index with where.
df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
df.Force.where(df.Force < 3).first_valid_index()
3
where will replace the part that does not meet the condition with np.nan by default. Then, we find the first valid index out of the series.
Or this: select a subset of the item that you are interested in, here Variable == 1. Then find the first item in its index.
df = pd.DataFrame([[5], [4], [3], [2], [1]], columns=["Force"])
v = (df["Force"] < 3)
v[v == 1].index[0]
Bonus: if you need the index of first appearance of many kinds of items, you can use drop_duplicates.
df = pd.DataFrame([["yello"], ["yello"], ["blue"], ["red"], ["blue"], ["red"]], columns=["Force"])
df.Force.drop_duplicates().reset_index()
index Force
0 0 yello
1 2 blue
2 3 red
Some more work...
df.Force.drop_duplicates().reset_index().set_index("Force").to_dict()["index"]
{'blue': 2, 'red': 3, 'yello': 0}
回答3:
Below is a non-pandas solution which I find easy to adapt:
import pandas as pd
df = pd.DataFrame(dict(Force=[5, 4, 3, 2, 1]), list('abcde'))
next(idx for idx, x in zip(df.index, df.Force) if x < 3) # d
It works by iterating to the first result of a generator expression.
Pandas appears to perform poorly in comparison:
df = pd.DataFrame(dict(Force=np.random.randint(0, 100000, 100000)))
n = 99900
%timeit df['Force'].lt(n).idxmin()
# 1000 loops, best of 3: 1.57 ms per loop
%timeit df.Force.where(df.Force > n).first_valid_index()
# 100 loops, best of 3: 1.61 ms per loop
%timeit next(idx for idx, x in zip(df.index, df.Force) if x > n)
# 10000 loops, best of 3: 100 µs per loop
来源:https://stackoverflow.com/questions/48634271/finding-the-index-of-the-first-element-e-g-true-from-a-series-column