问题
I've got a DataFrame with the metadata for a newspaper article in each row. I'd like to group these into monthly chunks, then count the values of one column (called type
):
monthly_articles = articles.groupby(pd.Grouper(freq="M"))
monthly_articles = monthly_articles["type"].value_counts().unstack()
This works fine with an annual group but fails when I try to group by month:
ValueError: operands could not be broadcast together with shape (141,) (139,)
I think this is because there are some month groups in which there are no articles. If I iterate the groups and print value_counts on each group:
for name, group in monthly_articles:
print(name, group["type"].value_counts())
I get empty series in the groups for Jan and Feb of 2006:
2005-12-31 00:00:00 positive 1
Name: type, dtype: int64
2006-01-31 00:00:00 Series([], Name: type, dtype: int64)
2006-02-28 00:00:00 Series([], Name: type, dtype: int64)
2006-03-31 00:00:00 negative 6
positive 5
neutral 1
Name: type, dtype: int64
2006-04-30 00:00:00 negative 11
positive 6
neutral 3
Name: type, dtype: int64
How can I ignore the empty groups when using value_counts()
?
I've tried dropna=False
without success. I think this is the same issue as this question.
回答1:
You'd better give us data sample. Otherwise, it is a little hard to point out the problem. From your code snippet, it seems that the type
data for some months is null. You can use apply
function on grouped objects and then call unstack
function. Here is the sample code that works for me, and the data is randomly generated
s = pd.Series(['positive', 'negtive', 'neutral'], index=[0, 1, 2])
atype = s.loc[np.random.randint(3, size=(150,))]
df = pd.DataFrame(dict(atype=atype.values), index=pd.date_range('2017-01-01', periods=150))
gp = df.groupby(pd.Grouper(freq='M'))
dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()
In [75]: dfx
Out[75]:
negtive neutral positive
2017-01-31 13 9 9
2017-02-28 11 11 6
2017-03-31 12 6 13
2017-04-30 8 12 10
2017-05-31 9 10 11
In case there are null values:
In [76]: df.loc['2017-02-01':'2017-04-01', 'atype'] = np.nan
...: gp = df.groupby(pd.Grouper(freq='M'))
...: dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()
...:
In [77]: dfx
Out[77]:
negtive neutral positive
2017-01-31 13 9 9
2017-04-30 8 12 9
2017-05-31 9 10 11
Thanks.
来源:https://stackoverflow.com/questions/45803984/how-can-i-ignore-empty-series-when-using-value-counts-on-a-pandas-groupby