问题
Imagine a pandas
dataframe that are given by
df = pd.DataFrame({
'id': [1, 1, 1, 2, 2],
'location': [1, 2, 3, 1, 2],
'date': [pd.to_datetime('01-01-{}'.format(year)) for year in [2015, 2016, 2015, 2017, 2018]]
}).set_index('id')
which looks like this
location date
id
1 1 2015-01-01
1 2 2016-01-01
1 3 2015-01-01
2 1 2017-01-01
2 2 2018-01-01
Now I want to create a column for each year represented in the date
column that counts occurences by id
. Hence the resulting data frame should be like this
location date 2015 2016 2017 2018
id
1 1 2015-01-01 2 1 0 0
1 2 2016-01-01 2 1 0 0
1 3 2015-01-01 2 1 0 0
2 1 2017-01-01 0 0 1 1
2 2 2018-01-01 0 0 1 1
Now I imagine using pd.groupby.transform but I can't figure out the best solution.
My own solution was
df['year'] = df['date'].map(lambda x: x.year)
df = pd.merge(
df,
pd.pivot_table(df, 'date', 'id', 'year', 'count').fillna(0).astype(int),
left_index=True, right_index=True).drop('year', axis=1)
回答1:
get_dummies
df.join(pd.get_dummies(df.date.dt.year).sum(level=0))
date location 2015 2016 2017 2018
id
1 2015-01-01 1 2 1 0 0
1 2016-01-01 2 2 1 0 0
1 2015-01-01 3 2 1 0 0
2 2017-01-01 1 0 0 1 1
2 2018-01-01 2 0 0 1 1
factorize
i, r = pd.factorize(df.index)
j, c = pd.factorize(df.date.dt.year)
n, m = shape = len(r), len(c)
b = np.zeros(shape, dtype=np.int64)
np.add.at(b, (i, j), 1)
df.join(pd.DataFrame(b, r, c).rename_axis('id'))
date location 2015 2016 2017 2018
id
1 2015-01-01 1 2 1 0 0
1 2016-01-01 2 2 1 0 0
1 2015-01-01 3 2 1 0 0
2 2017-01-01 1 0 0 1 1
2 2018-01-01 2 0 0 1 1
回答2:
Create helper DataFrame
by groupby with size, unstack and year and join to original df
:
df1 = df.join(df.groupby(['id', df['date'].dt.year]).size().unstack(fill_value=0), on='id')
print (df1)
location date 2015 2016 2017 2018
id
1 1 2015-01-01 2 1 0 0
1 2 2016-01-01 2 1 0 0
1 3 2015-01-01 2 1 0 0
2 1 2017-01-01 0 0 1 1
2 2 2018-01-01 0 0 1 1
Detail:
print (df.groupby(['id', df['date'].dt.year]).size().unstack(fill_value=0))
date 2015 2016 2017 2018
id
1 2 1 0 0
2 0 0 1 1
Another solution with crosstab:
df1 = df.join(pd.crosstab(df.index, df['date'].dt.year), on='id')
print (pd.crosstab(df.index, df['date'].dt.year))
date 2015 2016 2017 2018
row_0
1 2 1 0 0
2 0 0 1 1
来源:https://stackoverflow.com/questions/52256120/count-occurences-for-each-year-in-pandas-dataframe-based-on-subgroup