Count by condition applied to the same column in Pandas

帅比萌擦擦* 提交于 2021-02-10 06:14:09

问题


This is my data frame.

acc_index    veh_count    veh_type
001             1            1
002             2            1
002             2            2
003             2            1
003             2            2
004             1            1
005             2            1
005             2            3
006             1            2
007             2            1
007             2            2
008             2            1
008             2            1
009             3            1
009             3            1
009             3            2

acc_index is unique for each accident

veh_count shows how many vehicles are involved in one accident

veh_type shows the type of vehicles involved in an accident (1=bicycle, 2=car, 3=bus).

What I want to do is to count the number of accidents between cars and bicycles (so, where veh_type=1 and veh_type=9 for the same acc_index), even if there were more cars or bicycles involved, I still want to count it as one accident. How can I do that?

I tried to do it with the code below, but I get the count of all accidents involving cars or bikes, and I want to get only the ones between them.

df[(df['veh_count'] >=2) & (df.veh_type.isin(['1','2']))].groupby(['acc_index', 'veh_count', 'veh_type']).count()

I want to get something like this below, but also with the whole dataframe, and not only total sum.

acc_index    veh_count    veh_type     count
002             2            1           
002             2            2
                           count         1
003             2            1
003             2            2
                           count         1
007             2            1
007             2            2
                           count         1
009             3            1
009             3            1
009             3            2
                           count         1
                        total_count      4

If you have a better solution/idea, I would appreciate.


回答1:


IIUC, you can check veh_type for those of interest and groupby:

(df.assign(car=df.veh_type.eq(1),
          bike=df.veh_type.eq(2))  # change 2 to correct type
   [['acc_index','car','bike']]
   .groupby('acc_index')
   .any()
   .all(1).sum()
)

Output:

4

Update: If you want all the the rows:

s = (df.assign(car=df.veh_type.eq(1),
          bike=df.veh_type.eq(2))  # change 2 to correct type
   [['acc_index','car','bike']]
   .groupby('acc_index')
   .any()
   .all(1)
)

df[df['acc_index'].map(s)]

Output:

    acc_index  veh_count  veh_type
1           2          2         1
2           2          2         2
3           3          2         1
4           3          2         2
9           7          2         1
10          7          2         2
13          9          3         1
14          9          3         1
15          9          3         2


来源:https://stackoverflow.com/questions/60286060/count-by-condition-applied-to-the-same-column-in-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!