pandas-groupby | 易学教程

Counting observations after grouping by dates in pandas, when dates are non-unique

阅读更多关于 Counting observations after grouping by dates in pandas, when dates are non-unique

问题 What is the best way to count observations by date in a Pandas DataFrame when the timestamps are non-unique? df = pd.DataFrame({'User' : ['A', 'B', 'C'] * 40, 'Value' : np.random.randn(120), 'Time' : [np.random.choice(pd.date_range(datetime.datetime(2013,1,1,0,0,0),datetime.datetime(2013,1,3,0,0,0),freq='H')) for i in range(120)]}) Ideally, the output would provide the number of observations per day (or some other higher order unit of time). This could then be used to plot the activity over

fill dataframe with NaN when multiple days data is missing

阅读更多关于 fill dataframe with NaN when multiple days data is missing

问题 I have a pandas dataframe which I interpolate to get a daily dataframe. The original dataframe looks like this: col_1 vals 2017-10-01 0.000000 0.112869 2017-10-02 0.017143 0.112869 2017-10-12 0.003750 0.117274 2017-10-14 0.000000 0.161556 2017-10-17 0.000000 0.116264 In the interpolated dataframe, I want to change data values to NaN where the gap in dates exceeds 5 days. E.g. in the dataframe above, the gap between 2017-10-02 and 2017-10-12 exceeds 5 days therefore in the interpolated

Custom pandas groupby on a list of intervals

阅读更多关于 Custom pandas groupby on a list of intervals

问题 I have a dataframe df : A B 0 28 abc 1 29 def 2 30 hij 3 31 hij 4 32 abc 5 28 abc 6 28 abc 7 29 def 8 30 hij 9 28 abc 10 29 klm 11 30 nop 12 28 abc 13 29 xyz df.dtypes A object # A is a string column as well B object dtype: object I want to use the values from this list to groupby: i = np.array([ 3, 5, 6, 9, 12, 14]) Basically, all rows in df with index 0, 1, 2 are in the first group, rows with index 3, 4 are in the second group, rows with index 5 are in the third group, and so on. My end

How do I pivot one DataFrame column to a truth table with columns based on another DataFrame?

阅读更多关于 How do I pivot one DataFrame column to a truth table with columns based on another DataFrame?

问题 I have one df with a user_id and a category . I'd like to transform this to a truth table for whether or not that user has at least one entry for that category. However, the final table should also include columns for all categories that appear in 'df_list', which may not appear at all in df . Right now I create the truth table with a groupby + size and then check if any columns are missing, and then manually set those columns to False , but I was wondering if there was a way to accomplish

How to sort Pandas DataFrame both by MultiIndex and by value?

阅读更多关于 How to sort Pandas DataFrame both by MultiIndex and by value?

问题 Sample data: mdf = pd.DataFrame([[1,2,50],[1,2,20], [1,5,10],[2,8,80], [2,5,65],[2,8,10] ], columns=['src','dst','n']); mdf src dst n 0 1 2 50 1 1 2 20 2 1 5 10 3 2 8 80 4 2 5 65 5 2 8 10 groupby() gives a two-level multi-index: test = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); test sum count src dst 1 2 70 2 5 10 1 2 5 65 1 8 90 2 Question: how to sort this DataFrame by src ascending and then by sum descending? I'm a beginner with pandas, learned about sort_index() and sort_values

Sum a seprate colum based on the range of the dataframe between values in other columns after groupby

阅读更多关于 Sum a seprate colum based on the range of the dataframe between values in other columns after groupby

问题 I have a dataframe as below id Supply days days_180 1 30 0 180 1 100 183 363 1 80 250 430 2 5 0 180 2 5 10 190 3 5 0 180 3 30 100 280 3 30 150 330 3 30 200 380 3 30 280 460 3 50 310 490 I want to sum 'Supply' where days are between 'days' & 'days+180' for each row. This needs to be done for each group after groupby('id'). The expected output is as below id Supply days days_180 use 1 30 0 180 30 1 100 183 363 180 1 80 250 430 80 2 5 0 180 10 2 5 10 190 10 3 5 0 180 65 3 30 100 280 120 3 30 150

Pandas groupby each column and add new column for each group

阅读更多关于 Pandas groupby each column and add new column for each group

问题 I have a data frame like this lvl1=['l1A','l1A','l1B','l1C','l1D'] lvl2=['l2A','l2A','l2A','l26','l27'] wgt=[.2,.3,.15,.05,.3] lvls=[lvl1,lvl2] df=pd.DataFrame(wgt, lvls).reset_index() df.columns = ['lvl' + str(i) for i in range(1,3)] + ['wgt'] df lvl1 lvl2 wgt 0 l1A l2A 0.20 1 l1A l2A 0.30 2 l1B l2A 0.15 3 l1C l26 0.05 4 l1D l27 0.30 I want to get the average weight at each level and add them as a separate column to this data frame. pd.concat([df, df.groupby('lvl1').transform('mean').add

Pandas and groupby count the number of matches in two different columns

阅读更多关于 Pandas and groupby count the number of matches in two different columns

问题 I would like to count the number of matches after a groupby in a pandas dataframe. claim event material1 material2 A X M1 M2 A X M2 M3 A X M3 M0 A X M4 M4 A Y M5 M5 A Y M6 M0 B Z M7 M0 B Z M8 M0 First, I group by the pair claim event and for each of these groups I want to count the number of matches between the columns material1 and material 2 For the group by, I have grouped = df.groupby(['claim', 'event']) but then I don't know how to compare the two new columns. It should return the

Groupby, transpose and append in Pandas?

阅读更多关于 Groupby, transpose and append in Pandas?

问题 I have a dataframe which looks like this: Each user has 10 records. Now, I want to create a dataframe which looks like this: userid name1 name2 ... name10 which means I need to invert every 10 records of the column name and append to a new dataframe. So, how do it do it? Is there any way I can do it in Pandas? 回答1: groupby('userid') then reset_index within each group to enumerate consistently across groups. Then unstack to get columns. df.groupby('userid')['name'].apply(lambda df: df.reset

Pandas GroupBy.agg() throws TypeError: aggregate() missing 1 required positional argument: 'arg'

阅读更多关于 Pandas GroupBy.agg() throws TypeError: aggregate() missing 1 required positional argument: 'arg'

问题 I’m trying to create multiple aggregations of the same field. I’m working in pandas, in python3.7. The syntax seems pretty straightforward based on the documentation: https://pandas-docs.github.io/pandas-docs-travis/user_guide/groupby.html#named-aggregation I do not see why I’m getting the error below. Could someone please point out the issue and tell me how to fix it? code: qt_dy.groupby('date').agg(std_qty=('qty','std'),mean_qty=('qty','mean'),) error: --------------------------------------