Count Number of Rows Between Two Dates BY ID in a Pandas GroupBy Dataframe

后端未结

关注

 3  1815

一个人的身影 2020-12-10 08:46

I have the following test DataFrame:

import random
from datetime import timedelta
import pandas as pd
import datetime

#create test range of dates
rng=pd.dat


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   天涯浪人
                                             
                
                
                (楼主)
            
              
              
                2020-12-10 09:01
              

            
            
                        
Here is a solution I came up with (this will loop through the permutations of unique cid's and date range getting your counts):

from itertools import product
df_new_date=pd.DataFrame(list(product(df.cid.unique(),pd.date_range(df.stdt.min(), df.enddt.max()))),columns=['cid','newdate'])
df_new_date['cnt']=df_new_date.apply(lambda row:df[(df['cid']==row['cid'])&(df['stdt']<=row['newdate'])&(df['enddt']>=row['newdate'])]['jid'].count(),axis=1)

>>> df_new_date.head(20) 
    cid    newdate  cnt
0     1 2015-07-01    0
1     1 2015-07-02    0
2     1 2015-07-03    0
3     1 2015-07-04    0
4     1 2015-07-05    0
5     1 2015-07-06    1
6     1 2015-07-07    1
7     1 2015-07-08    1
8     1 2015-07-09    1
9     1 2015-07-10    1
10    1 2015-07-11    2
11    1 2015-07-12    3
12    1 2015-07-13    3
13    1 2015-07-14    2
14    1 2015-07-15    3
15    1 2015-07-16    3
16    1 2015-07-17    3
17    1 2015-07-18    3
18    1 2015-07-19    2
19    1 2015-07-20    1


You could then drop the zeros if you don't want them.  I don't think this will be much better than your original solution, however.

I would like to suggest you use the following improvement on the loop provided by the @DSM solution:

df_parts=[]
for cid in df.cid.unique():
    full_count=df[(df.cid==cid)][['cid','date','count']].set_index("date").asfreq("D", method='ffill')[['cid','count']].reset_index()
    df_parts.append(full_count[full_count['count']!=0])

df_new = pd.concat(df_parts)

>>> df_new
         date  cid  count
0  2015-07-06    1      1
1  2015-07-07    1      1
2  2015-07-08    1      1
3  2015-07-09    1      1
4  2015-07-10    1      1
5  2015-07-11    1      2
6  2015-07-12    1      3
7  2015-07-13    1      3
8  2015-07-14    1      2
9  2015-07-15    1      3
10 2015-07-16    1      3
11 2015-07-17    1      3
12 2015-07-18    1      3
13 2015-07-19    1      2
14 2015-07-20    1      1
15 2015-07-21    1      1
16 2015-07-22    1      1
0  2015-07-01    2      1
1  2015-07-02    2      1
2  2015-07-03    2      1
3  2015-07-04    2      1
4  2015-07-05    2      1
5  2015-07-06    2      1
6  2015-07-07    2      2
7  2015-07-08    2      2
8  2015-07-09    2      2
9  2015-07-10    2      3
10 2015-07-11    2      3
11 2015-07-12    2      4
12 2015-07-13    2      4
13 2015-07-14    2      5
14 2015-07-15    2      4
15 2015-07-16    2      4
16 2015-07-17    2      3
17 2015-07-18    2      2
18 2015-07-19    2      2
19 2015-07-20    2      1
20 2015-07-21    2      1


Only real improvement over what @DSM provided is that this will avoid requiring the creation of a groubby object for the loop and this will also get you all the min stdt and max enddt per cid number with no zero values. 
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复