How can I ignore empty series when using value_counts on a Pandas groupby?

百般思念 提交于 2019-12-10 21:49:14

问题


I've got a DataFrame with the metadata for a newspaper article in each row. I'd like to group these into monthly chunks, then count the values of one column (called type):

monthly_articles = articles.groupby(pd.Grouper(freq="M"))
monthly_articles = monthly_articles["type"].value_counts().unstack()

This works fine with an annual group but fails when I try to group by month:

ValueError: operands could not be broadcast together with shape (141,) (139,)

I think this is because there are some month groups in which there are no articles. If I iterate the groups and print value_counts on each group:

for name, group in monthly_articles:
    print(name, group["type"].value_counts())

I get empty series in the groups for Jan and Feb of 2006:

2005-12-31 00:00:00 positive    1
Name: type, dtype: int64
2006-01-31 00:00:00 Series([], Name: type, dtype: int64)
2006-02-28 00:00:00 Series([], Name: type, dtype: int64)
2006-03-31 00:00:00 negative    6
positive    5
neutral     1
Name: type, dtype: int64
2006-04-30 00:00:00 negative    11
positive     6
neutral      3
Name: type, dtype: int64

How can I ignore the empty groups when using value_counts()?

I've tried dropna=False without success. I think this is the same issue as this question.


回答1:


You'd better give us data sample. Otherwise, it is a little hard to point out the problem. From your code snippet, it seems that the type data for some months is null. You can use apply function on grouped objects and then call unstack function. Here is the sample code that works for me, and the data is randomly generated

s = pd.Series(['positive', 'negtive', 'neutral'], index=[0, 1, 2])
atype = s.loc[np.random.randint(3, size=(150,))]

df = pd.DataFrame(dict(atype=atype.values), index=pd.date_range('2017-01-01',  periods=150))

gp = df.groupby(pd.Grouper(freq='M'))
dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()

In [75]: dfx
Out[75]: 
            negtive  neutral  positive
2017-01-31       13        9         9
2017-02-28       11       11         6
2017-03-31       12        6        13
2017-04-30        8       12        10
2017-05-31        9       10        11

In case there are null values:

In [76]: df.loc['2017-02-01':'2017-04-01', 'atype'] = np.nan
    ...: gp = df.groupby(pd.Grouper(freq='M'))
    ...: dfx = gp.apply(lambda g: g['atype'].value_counts()).unstack()
    ...: 

In [77]: dfx
Out[77]: 
            negtive  neutral  positive
2017-01-31       13        9         9
2017-04-30        8       12         9
2017-05-31        9       10        11

Thanks.



来源:https://stackoverflow.com/questions/45803984/how-can-i-ignore-empty-series-when-using-value-counts-on-a-pandas-groupby

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!