Add a unique identifier in a new column until a condition met on another column

妖精的绣舞 提交于 2019-12-11 06:38:33

问题


I have a dask dataframe with npartition=8, here is the snapshot of the data:

      id1    id2     Page_nbr    record_type 
      St1    Sc1     3             START
      Sc1    St1     5              ADD      
      Sc1    St1     9             OTHER 
      Sc2    St2     34            START
      Sc2    St2     45           DURATION  
      Sc2    St2     65             END
      Sc3    Sc3     4              START  

I want to add a column after record_type and add a unique group_id based on the condition of record type, so till the next record_type=START add the same unique group_id, output will look like below:

      id1    id2     Page_nbr    record_type     group_id
      St1    Sc1     3             START             1
      Sc1    St1     5              ADD              1    
      Sc1    St1     9             OTHER             1 
      Sc2    St2     34            START             2
      Sc2    St2     45           DURATION           2
      Sc2    St2     65             END              2
      Sc3    Sc3     4              START            3 

The group_id can be any unique number. As the dataframe is huge iterating over rows may not be the best option. Wondering if there is any pythonic way to do so?


回答1:


Take the "record_type" column, compare to "START", and then compute the cumsum:

ddf['group_id'] = ddf['record_type'].eq('START').cumsum()
ddf.compute()

   id1  id2  Page_nbr record_type  group_id
0  St1  Sc1         3       START         1
1  Sc1  St1         5         ADD         1
2  Sc1  St1         9       OTHER         1
3  Sc2  St2        34       START         2
4  Sc2  St2        45    DURATION         2
5  Sc2  St2        65         END         2
6  Sc3  Sc3         4       START         3


来源:https://stackoverflow.com/questions/54876003/add-a-unique-identifier-in-a-new-column-until-a-condition-met-on-another-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!