问题
Suppose I have a simple pandas dataframe df as so:
| name | car |
|----|-----------|-------|
| 0 | 'bob' | 'b' |
| 1 | 'bob' | 'c' |
| 2 | 'fox' | 'b' |
| 3 | 'fox' | 'c' |
| 4 | 'cox' | 'b' |
| 5 | 'cox' | 'c' |
| 6 | 'jo' | 'b' |
| 7 | 'jo' | 'c' |
| 8 | 'bob' | 'b' |
| 9 | 'bob' | 'c' |
| 10 | 'bob' | 'b' |
| 11 | 'bob' | 'c' |
| 12 | 'rob' | 'b' |
| 13 | 'rob' | 'c' |
I would like to find the row indices of a specific pattern that spans both columns. In my real application, the above dataframe has a few thousand rows and I have a few thousand dataframes so performance is not important. The pattern, say, that I am interested in is:
| 'bob' | 'b' |
| 'bob' | 'c' |
Hence, using the above example, my desired output would be:
out_idx = [0,1,8,9,10,11]
Typically of course, for one pattern, one would do something like df.loc[(df.name == 'bob') & (df.car == 'b')] but I am not sure how to do it when I am looking for a specific and multivariate pattern over multiple columns. I.e. I am looking for (and I am pretty the following is not correct): df.loc[(df.name == 'bob') & (df.car == 'b') & (df.car == 'c')].
Help much appreciated. Thx!
回答1:
Use boolean indexing with Series.isin instead second and third conditions:
df1 = df[(df.name == 'bob') & df.car.isin(['b','c'])]
print (df1)
name car
0 bob b
1 bob c
8 bob b
9 bob c
10 bob b
11 bob c
If need index values:
out_idx = df.index[(df.name == 'bob') & df.car.isin(['b','c'])]
Or:
out_idx = df[(df.name == 'bob') & df.car.isin(['b','c'])].index
Your solution is possible with | (bitwise OR) instead second & and also added one ():
df1 = df[(df.name == 'bob') & ((df.car == 'b') | (df.car == 'c'))]
来源:https://stackoverflow.com/questions/58195572/find-all-indices-instances-of-all-repeating-patterns-across-columns-and-rows-of