问题
I have the following abridged dataframe:
{'end': {0: 1995, 1: 1997, 2: 1999, 3: 2001, 4: 2003, 5: 2005, 6: 2007, 7: 2013, 8: 2014, 9: 1995, 10: 2007, 11: 2013, 12: 2014, 13: 1989,
14: 1991, 15: 1993, 16: 1995, 17: 1997, 18: 1999, 19: 2001, 20: 2003,
21: 2005, 22: 2007, 23: 2013, 24: 2014, 25: 1985, 26: 1987, 27: 1989, 28: 1991, 29: 1993}, 'idthomas': {0: 136, 1: 136, 2: 136, 3: 136, 4:136,
5: 136, 6: 136, 7: 136, 8: 136, 9: 172, 10: 172, 11: 172, 12: 172, 13: 174, 14: 174, 15: 174, 16: 174, 17: 174, 18: 174, 19: 174, 20: 174, 21: 174, 22: 174, 23: 174, 24: 174, 25: 179, 26: 179, 27: 179, 28: 179,
29: 179}, 'start': {0: 1993, 1: 1995, 2: 1997, 3: 1999, 4: 2001, 5: 2003, 6: 2005, 7: 2007, 8: 2013, 9: 1993, 10: 2001, 11: 2007, 12: 2013, 13: 1987, 14: 1989, 15: 1991, 16: 1993, 17: 1995, 18: 1997, 19: 1999, 20: 2001, 21: 2003, 22: 2005, 23: 2007, 24: 2013, 25: 1983, 26: 1985, 27: 1987, 28: 1989, 29: 1991}}
df_oddyears.head()
end start idthomas
0 1995 1993 136
1 1997 1995 136
2 1999 1997 136
3 2001 1999 136
4 2003 2001 136
5 2005 2003 136
6 2007 2005 136
7 2013 2007 136
8 2014 2013 136
9 1995 1993 172
10 2007 2001 172
11 2013 2007 172
12 2014 2013 172
13 1989 1987 174
14 1991 1989 174
It represents U.S. legislator congressional terms. There are some inconvenient irregularities: startand end dates indicate beginning and end of terms and will have a 2 or 6 year difference depending on if the legislator is serving in the house or senate. All legislators have a unique idthomas and can switch from house to senate if they choose. Sometimes a legislator is not re-elected, this causes a gap in their service. Looking at idthomas == 172 you can see a gap between end == 1995 and start == 2001.
I need to calculate the years of active-accumulated public service for every year from the beginning of the legislators service until the end of the legislators service, even years included. In a next step I will merge this df with another df along years, hence I need both even and odd years of active service.
This is what I had developed before seeing deeper into the problem:
df_oddyears['end']=df_oddyears['end'].map(lambda x: str(x)[:-6])
df_oddyears['start']=df_oddyears['start'].map(lambda x: str(x[:-6]))
df_oddyears['end'] = df_oddyears['end'].astype('int')
df_oddyears['start'] = df_oddyears['start'].astype('int')
df_oddyears['end'] = df_oddyears['end'].clip_upper(2014)
df_oddyears['term'] = df_oddyears.end - df_oddyears.start
df_oddyears['years_exp']=df_oddyears.groupby(['id.thomas']).term.cumsum()
df_oddyears.rename(columns={'id.thomas':'idthomas'},inplace=True)
df_oddyears.head()
end start idthomas term years_exp
0 1995 1993 136 2 2
1 1997 1995 136 2 4
2 1999 1997 136 2 6
3 2001 1999 136 2 8
4 2003 2001 136 2 10
5 2005 2003 136 2 12
6 2007 2005 136 2 14
7 2013 2007 136 6 20
8 2014 2013 136 1 21
9 1995 1993 172 2 2
10 2007 2001 172 6 8
11 2013 2007 172 6 14
12 2014 2013 172 1 15
{'end': {0: 1995, 1: 1997, 2: 1999, 3: 2001, 4: 2003, 5: 2005, 6: 2007,
7: 2013, 8: 2014, 9: 1995, 10: 2007, 11: 2013, 12: 2014, 13: 1989,
14: 1991, 15: 1993, 16: 1995, 17: 1997, 18: 1999, 19: 2001, 20: 2003,
21: 2005, 22: 2007, 23: 2013, 24: 2014, 25: 1985, 26: 1987, 27: 1989,
28: 1991, 29: 1993}, 'idthomas': {0: 136, 1: 136, 2: 136, 3: 136,
4: 136, 5: 136, 6: 136, 7: 136, 8: 136, 9: 172, 10: 172, 11: 172, 12: 172, 13: 174, 14: 174, 15: 174, 16: 174, 17: 174, 18: 174, 19: 174,
20: 174, 21: 174, 22: 174, 23: 174, 24: 174, 25: 179, 26: 179, 27: 179, 28: 179, 29: 179},'start': {0: 1993, 1: 1995, 2: 1997, 3: 1999, 4: 2001, 5: 2003, 6: 2005, 7: 2007, 8: 2013, 9: 1993, 10: 2001, 11: 2007, 12: 2013, 13: 1987, 14: 1989, 15: 1991, 16: 1993, 17: 1995, 18: 1997, 19: 1999, 20: 2001, 21: 2003, 22: 2005, 23: 2007, 24: 2013, 25: 1983, 26: 1985, 27: 1987, 28: 1989, 29: 1991},'term': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2, 5: 2, 6: 2, 7: 6, 8: 1, 9: 2, 10: 6, 11: 6, 12: 1, 13: 2, 14: 2, 15: 2, 16: 2, 17: 2, 18: 2, 19: 2, 20: 2, 21: 2, 22: 2, 23: 6,
24: 1, 25: 2, 26: 2, 27: 2, 28: 2, 29: 2},'years_exp': {0: 2, 1: 4,
2: 6, 3: 8, 4: 10, 5: 12, 6: 14, 7: 20, 8: 21, 9: 2, 10: 8, 11: 14,
12: 15, 13: 2, 14: 4, 15: 6, 16: 8, 17: 10, 18: 12, 19: 14, 20: 16,
21: 18, 22: 20, 23: 26, 24: 27, 25: 2, 26: 4, 27: 6, 28: 8, 29: 10}}
Then I df=df_oddyears.drop(['start', 'term'], axis=1, inplace=False) and implement the following code from here
final_year = 2014
df= pd.DataFrame([(year, id_, n)
for id_, end, years_exp in df.groupby('idthomas').first().itertuples()
for n, year in enumerate(range(end, final_year + 1), years_exp)],
columns=['end', 'idthomas', 'years_exp'])
df.head()
end idthomas years_exp
673 1995 172 2
674 1996 172 3
675 1997 172 4
676 1998 172 5
677 1999 172 6
678 2000 172 7
679 2001 172 8
680 2002 172 9
681 2003 172 10
This is very close to what I want in that it enables me to concatenate to another df on end while maintaining the total years_exp. Unfortunately, I failed to recognize the problem of intermittent service when posting my original question; such that, years_expdoes not take into account gaps in public service. This is (the first in a list of) project(s) today. If anyone has questions or suggestions or critiques, they are all welcome.
My desired end result would be the following:
end idthomas years_exp
0 1994 136 1
1 1995 136 2
2 1996 136 3
3 1997 136 4
4 1998 136 5
5 1999 136 6
6 2000 136 7
7 2001 136 8
8 2002 136 9
9 2003 136 10
10 2004 136 11
11 2005 136 12
12 2006 136 13
13 2007 136 14
14 2008 136 15
15 2009 136 16
16 2010 136 17
17 2011 136 18
18 2012 136 19
19 2013 136 20
20 2014 136 21
21 1994 172 1
22 1995 172 2
23 2001 172 2
24 2002 172 3
25 2003 172 4
26 2004 172 5
27 2005 172 6
28 2006 172 7
29 2007 172 8
30 2008 172 9
31 2009 172 10
32 2010 172 11
33 2011 172 12
34 2012 172 13
35 2013 172 14
36 2014 172 15
来源:https://stackoverflow.com/questions/36661846/pandas-calculating-columns-conditioned-on-the-values-of-2-other-columns