Pandas: fill one column with count of # of obs between occurrences in a 2nd column

拈花ヽ惹草 提交于 2021-02-18 21:57:21

问题


Say I have the following DataFrame which has a 0/1 entry depending on whether something happened/didn't happen within a certain month.

Y = [0,0,1,1,0,0,0,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))

df = pd.DataFrame({'R': Y},index = X)



            R
2010-01-01  0
2010-02-01  0
2010-03-01  1
2010-04-01  1
2010-05-01  0
2010-06-01  0
2010-07-01  0
2010-08-01  0
2010-09-01  1
2010-10-01  1
2010-11-01  1

What I want is to create a 2nd column that lists the # of months until the next occurrence of a 1.

That is, I need:

            R  F
2010-01-01  0  2
2010-02-01  0  1
2010-03-01  1  0
2010-04-01  1  0
2010-05-01  0  4
2010-06-01  0  3
2010-07-01  0  2
2010-08-01  0  1
2010-09-01  1  0
2010-10-01  1  0
2010-11-01  1  0

What I've tried: I haven't gotten far, but I'm able to fill the first bit

A = list(df.index)
T = df[df['R']==1]

a = df.index[0]
b = T.index[0]
c = A.index(b) - A.index(a)

df.loc[a:b, 'F'] = np.linspace(c,0,c+1)

            R    F
2010-01-01  0  2.0
2010-02-01  0  1.0
2010-03-01  1  0.0
2010-04-01  1  NaN
2010-05-01  0  NaN
2010-06-01  0  NaN
2010-07-01  0  NaN
2010-08-01  0  NaN
2010-09-01  1  NaN
2010-10-01  1  NaN
2010-11-01  1  NaN

EDIT Probably would have been better to provide an original example that spanned multiple years.

Y = [0,0,1,1,0,0,0,0,1,1,1,0,0,1,1,1,0,1,1,1]
X = pd.date_range(start = "2010", freq = "MS", periods = len(Y))

df = pd.DataFrame({'R': Y},index = X)

回答1:


Here is my way

s=df.R.cumsum()
df.loc[df.R==0,'F']=s.groupby(s).cumcount(ascending=False)+1
df.F.fillna(0,inplace=True)

df
Out[12]: 
            R    F
2010-01-01  0  2.0
2010-02-01  0  1.0
2010-03-01  1  0.0
2010-04-01  1  0.0
2010-05-01  0  4.0
2010-06-01  0  3.0
2010-07-01  0  2.0
2010-08-01  0  1.0
2010-09-01  1  0.0
2010-10-01  1  0.0
2010-11-01  1  0.0



回答2:


Create a series containing your dates, mask this series when your R series is not equal to 1, bfill, and subtract!


u = df.index.to_series()

ii = u.where(df.R.eq(1)).bfill()

12 * (ii.dt.year - u.dt.year) + (ii.dt.month - u.dt.month)

2010-01-01    2
2010-02-01    1
2010-03-01    0
2010-04-01    0
2010-05-01    4
2010-06-01    3
2010-07-01    2
2010-08-01    1
2010-09-01    0
2010-10-01    0
2010-11-01    0
Freq: MS, dtype: int64



回答3:


Here is a way that worked for me, not as elegant as @user3483203 but it does the job.

df['F'] = 0 
for i in df.index: 
     j = i 
     while df.loc[j, 'R'] == 0: 
         df.loc[i, 'F'] =df.loc[i, 'F'] + 1 
         j=j+1                                                                                                                      
df  
################
Out[39]: 
        index  R  F
0  2010-01-01  0  2
1  2010-02-01  0  1
2  2010-03-01  1  0
3  2010-04-01  1  0
4  2010-05-01  0  4
5  2010-06-01  0  3
6  2010-07-01  0  2
7  2010-08-01  0  1
8  2010-09-01  1  0
9  2010-10-01  1  0
10 2010-11-01  1  0

In [40]:   



回答4:


My take

s = (df.R.diff().ne(0) | df.R.eq(1)).cumsum()
s.groupby(s).transform(lambda s: np.arange(len(s),0,-1) if len(s)>1 else 0)

2010-01-01    2
2010-02-01    1
2010-03-01    0
2010-04-01    0
2010-05-01    4
2010-06-01    3
2010-07-01    2
2010-08-01    1
2010-09-01    0
2010-10-01    0
2010-11-01    0
Freq: MS, Name: R, dtype: int64


来源:https://stackoverflow.com/questions/57431667/pandas-fill-one-column-with-count-of-of-obs-between-occurrences-in-a-2nd-colu

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!