pandas-groupby

When is it appropriate to use df.value_counts() vs df.groupby('…').count()?

孤街浪徒 提交于 2019-11-26 12:27:53
问题 I\'ve heard in Pandas there\'s often multiple ways to do the same thing, but I was wondering – If I\'m trying to group data by a value within a specific column and count the number of items with that value, when does it make sense to use df.groupby(\'colA\').count() and when does it make sense to use df[\'colA\'].value_counts() ? 回答1: There is difference value_counts return: The resulting object will be in descending order so that the first element is the most frequently-occurring element.

Pandas, groupby and count

狂风中的少年 提交于 2019-11-26 11:30:58
问题 I have a dataframe say like this >>> df = pd.DataFrame({\'user_id\':[\'a\',\'a\',\'s\',\'s\',\'s\'], \'session\':[4,5,4,5,5], \'revenue\':[-1,0,1,2,1]}) >>> df revenue session user_id 0 -1 4 a 1 0 5 a 2 1 4 s 3 2 5 s 4 1 5 s And each value of session and revenue represents a kind of type, and I want to count the number of each kind say the number of revenue=-1 and session=4 of user_id=a is 1. And I found simple call count() function afer groupby() can\'t output the result I want. >>> df

Multiple aggregations of the same column using pandas GroupBy.agg()

不打扰是莪最后的温柔 提交于 2019-11-26 11:16:22
Given the following (totally overkill) data frame example import pandas as pd import datetime as dt df = pd.DataFrame({ "date" : [dt.date(2012, x, 1) for x in range(1, 11)], "returns" : 0.05 * np.random.randn(10), "dummy" : np.repeat(1, 10) }) is there an existing built-in way to apply two different aggregating functions to the same column, without having to call agg multiple times? The syntactically wrong, but intuitively right, way to do it would be: # Assume `function1` and `function2` are defined for aggregating. df.groupby("dummy").agg({"returns":function1, "returns":function2}) Obviously

group by pandas dataframe and select latest in each group

你说的曾经没有我的故事 提交于 2019-11-26 10:56:00
问题 How to group values of pandas dataframe and select the latest(by date) from each group? For example, given a dataframe sorted by date: id product date 0 220 6647 2014-09-01 1 220 6647 2014-09-03 2 220 6647 2014-10-16 3 826 3380 2014-11-11 4 826 3380 2014-12-09 5 826 3380 2015-05-19 6 901 4555 2014-09-01 7 901 4555 2014-10-05 8 901 4555 2014-11-01 grouping by id or product, and selecting the earliest gives: id product date 2 220 6647 2014-10-16 5 826 3380 2015-05-19 8 901 4555 2014-11-01 回答1:

Aggregation in pandas

喜欢而已 提交于 2019-11-26 10:34:59
How to perform aggregation with pandas? No DataFrame after aggregation! What happened? How to aggregate mainly strings columns (to list s, tuple s, strings with separator )? How to aggregate counts? How to create new column filled by aggregated values? I've seen these recurring questions asking about various faces of the pandas aggregate functionality. Most of the information regarding aggregation and its various use cases today is fragmented across dozens of badly worded, unsearchable posts. The aim here is to collate some of the more important points for posterity. This Q/A is meant to be

why does pandas rolling use single dimension ndarray

孤街浪徒 提交于 2019-11-26 09:29:08
问题 I was motivated to use pandas rolling feature to perform a rolling multi-factor regression (This question is NOT about rolling multi-factor regression). I expected that I\'d be able to use apply after a df.rolling(2) and take the resulting pd.DataFrame extract the ndarray with .values and perform the requisite matrix multiplication. It didn\'t work out that way. Here is what I found: import pandas as pd import numpy as np np.random.seed([3,1415]) df = pd.DataFrame(np.random.rand(5, 2).round(2

Groupby value counts on the dataframe pandas

随声附和 提交于 2019-11-26 08:09:33
问题 I have the following dataframe: df = pd.DataFrame([ (1, 1, \'term1\'), (1, 2, \'term2\'), (1, 1, \'term1\'), (1, 1, \'term2\'), (2, 2, \'term3\'), (2, 3, \'term1\'), (2, 2, \'term1\') ], columns=[\'id\', \'group\', \'term\']) I want to group it by id and group and calculate the number of each term for this id, group pair. So in the end I am going to get something like this: I was able to achieve what I want by looping over all the rows with df.iterrows() and creating a new dataframe, but this

How to move pandas data from index to column after multiple groupby

狂风中的少年 提交于 2019-11-26 06:37:53
问题 I have the following pandas dataframe: dfalph.head() token year uses books 386 xanthos 1830 3 3 387 xanthos 1840 1 1 388 xanthos 1840 2 2 389 xanthos 1868 2 2 390 xanthos 1875 1 1 I aggregate the rows with duplicate token and years like so: dfalph = dfalph[[\'token\',\'year\',\'uses\',\'books\']].groupby([\'token\', \'year\']).agg([np.sum]) dfalph.columns = dfalph.columns.droplevel(1) dfalph.head() uses books token year xanthos 1830 3 3 1840 3 3 1867 2 2 1868 2 2 1875 1 1 Instead of having

Keep other columns when doing groupby

我与影子孤独终老i 提交于 2019-11-26 03:22:52
问题 I\'m using groupby on a pandas dataframe to drop all rows that don\'t have the minimum of a specific column. Something like this: df1 = df.groupby(\"item\", as_index=False)[\"diff\"].min() However, if I have more than those two columns, the other columns (e.g. otherstuff in my example) get dropped. Can I keep those columns using groupby , or am I going to have to find a different way to drop the rows? My data looks like: item diff otherstuff 0 1 2 1 1 1 1 2 2 1 3 7 3 2 -1 0 4 2 1 3 5 2 4 9 6

Count unique values with pandas per groups [duplicate]

拥有回忆 提交于 2019-11-26 03:03:49
问题 This question already has an answer here: Pandas count(distinct) equivalent 6 answers I need to count unique ID values in every domain I have data ID, domain 123, \'vk.com\' 123, \'vk.com\' 123, \'twitter.com\' 456, \'vk.com\' 456, \'facebook.com\' 456, \'vk.com\' 456, \'google.com\' 789, \'twitter.com\' 789, \'vk.com\' I try df.groupby([\'domain\', \'ID\']).count() But I want to get domain, count vk.com 3 twitter.com 2 facebook.com 1 google.com 1 回答1: You need nunique: df = df.groupby(