Count Number of Rows Between Two Dates BY ID in a Pandas GroupBy Dataframe

后端 未结 3 1825
一个人的身影
一个人的身影 2020-12-10 08:46

I have the following test DataFrame:

import random
from datetime import timedelta
import pandas as pd
import datetime

#create test range of dates
rng=pd.dat         


        
3条回答
  •  生来不讨喜
    2020-12-10 09:08

    My usual approach for these problems is to pivot and think in terms of events changing an accumulator. Every new "stdt" we see adds +1 to the count; every "enddt" we see adds -1. (Adds -1 the next day, at least if I'm interpreting "between" the way you are. Some days I think we should ban the use of the word as too ambiguous..)

    IOW, if we turn your frame to something like

    >>> df.head()
        cid  jid  change       date
    0     1  100       1 2015-01-06
    1     1  101       1 2015-01-07
    21    1  100      -1 2015-01-16
    22    1  101      -1 2015-01-17
    17    1  117       1 2015-03-01
    

    then what we want is simply the cumulative sum of change (after suitable regrouping.) For example, something like

    df["enddt"] += timedelta(days=1)
    df = pd.melt(df, id_vars=["cid", "jid"], var_name="change", value_name="date")
    df["change"] = df["change"].replace({"stdt": 1, "enddt": -1})
    df = df.sort(["cid", "date"])
    
    df = df.groupby(["cid", "date"],as_index=False)["change"].sum()
    df["count"] = df.groupby("cid")["change"].cumsum()
    
    new_time = pd.date_range(df.date.min(), df.date.max())
    
    df_parts = []
    for cid, group in df.groupby("cid"):
        full_count = group[["date", "count"]].set_index("date")
        full_count = full_count.reindex(new_time)
        full_count = full_count.ffill().fillna(0)
        full_count["cid"] = cid
        df_parts.append(full_count)
    
    df_new = pd.concat(df_parts)
    

    which gives me something like

    >>> df_new.head(15)
                count  cid
    2015-01-03      0    1
    2015-01-04      0    1
    2015-01-05      0    1
    2015-01-06      1    1
    2015-01-07      2    1
    2015-01-08      2    1
    2015-01-09      2    1
    2015-01-10      2    1
    2015-01-11      2    1
    2015-01-12      2    1
    2015-01-13      2    1
    2015-01-14      2    1
    2015-01-15      2    1
    2015-01-16      1    1
    2015-01-17      0    1
    

    There may be off-by-one differences with regards to your expectations; you may have different ideas about how you should handle multiple overlapping jids in the same time window (here they would count as 2); but the basic idea of working with the events should prove useful even if you have to tweak the details.

提交回复
热议问题