pandas-groupby | 易学教程

Grouping odd and even days

阅读更多关于 Grouping odd and even days

问题 I have a pandas dataframe as the following data Out[8]: value1 Date 2015-03-31 09:53:53.800 NaN 2015-03-31 10:28:54.700 1.34 2015-03-31 10:34:35.720 NaN 2015-03-31 10:36:53.540 1.26 2015-04-01 11:37:11.620 1.44 2015-04-01 11:39:30.520 NaN 2015-04-01 11:50:25.620 1.76 2015-04-02 11:50:30.620 1.38 2015-04-02 12:31:20.220 1.76 2015-04-02 12:37:43.940 2.36 2015-04-03 12:38:45.820 1.46 2015-04-03 12:41:56.680 2.26 2015-04-04 13:04:50.740 1.16 2015-04-05 12:38:45.820 1.46 2015-04-05 12:41:56.680 2

Finding most common values with Pandas GroupBy and value_counts

阅读更多关于 Finding most common values with Pandas GroupBy and value_counts

Splitting Column Lists in Pandas DataFrame

阅读更多关于 Splitting Column Lists in Pandas DataFrame

问题 I'm looking for an good way to solve the following problem. My current fix is not particularly clean, and I'm hoping to learn from your insight. Suppose I have a Panda DataFrame, whose entries look like this: >>> df=pd.DataFrame(index=[1,2,3],columns=['Color','Texture','IsGlass']) >>> df['Color']=[np.nan,['Red','Blue'],['Blue', 'Green', 'Purple']] >>> df['Texture']=[['Rough'],np.nan,['Silky', 'Shiny', 'Fuzzy']] >>> df['IsGlass']=[1,0,1] >>> df Color Texture IsGlass 1 NaN ['Rough'] 1 2 ['Red',

Grouping Pandas dataframe across rows

阅读更多关于 Grouping Pandas dataframe across rows

问题 I've a csv like this: client1,client2,client3,client4,client5,client6,amount ,,,Comp1,,,4.475000 ,,,Comp2,,,16.305584 ,,,Comp3,,,4.050000 Comp2,Comp1,,Comp4,,,21.000000 ,,,Comp4,,,30.000000 ,Comp1,,Comp2,,,5.137500 ,,,Comp3,,,52.650000 ,,,Comp1,,,2.650000 Comp3,,,Comp3,,,29.000000 Comp5,,,Comp2,,,20.809000 Comp5,,,Comp2,,,15.100000 Comp5,,,Comp2,,,52.404000 After reading it into a pandas dataframe, df, I wanted to aggregate in two steps: Step1: First, I sum the amount: client1 client2 client3

Spliting a dataframe into multiple 5-second dataframes in Python

阅读更多关于 Spliting a dataframe into multiple 5-second dataframes in Python

问题 I have a relatively big dataset that I want to split into multiple dataframes in Python based on a column containing a datetime object. The values in the column (that I want to split the dataframe by) are given in the following format: 2015-11-01 00:00:05 You may assume the dataframe looks like this. How can I split the dataframe into 5-second intervals in the following way: 1st dataframe 2015-11-01 00:00:00 - 2015-11-01 00:00:05 , 2nd dataframe 2015-11-01 00:00:05 - 2015-11-01 00:00:10 , and

How to do cumsum based on a time condition - resample pandas?

阅读更多关于 How to do cumsum based on a time condition - resample pandas?

问题 I have a dataframe like as shown below df = pd.DataFrame({ 'subject_id':[1,1,1,1,1,1], 'time_1' :['2173-04-03 10:00:00','2173-04-03 10:15:00','2173-04-03 10:30:00','2173-04-03 10:45:00','2173-04-03 11:05:00','2173- 04-03 11:15:00'], 'val' :[5,6,5,6,6,6] }) I would like to find the total duration of a value appearing in sequence. Below example will help you understand From the above screenshot, you can see that 6 occurs in sequence from 10:45 to 23:59 whereas other values (it could be any

DataError: No numeric types using mean aggregate function but not sum?

阅读更多关于 DataError: No numeric types using mean aggregate function but not sum?

问题 I was wondering if someone could help explain the below behaviour using agg() import numpy as np import pandas as pd import string Initialise Data Frame df = pd.DataFrame(data=[list(string.ascii_lowercase)[0:5]*2,list(range(1,11)),list(range(11,21))]).T df.columns = columns=['g','c1','c2'] df.sort_values(['g']).head(5) g c1 c2 0 a 1 11 5 a 6 16 1 b 2 12 6 b 7 17 2 c 3 13 As an example I am summing and averaging across c1 and c2 while doing a group by g No data error scenario: f = { 'c1' :

pandas groupby - custom function

阅读更多关于 pandas groupby - custom function

问题 I have the following dataframe to which I use groupby and sum(): d = {'col1': ["A", "A", "A", "B", "B", "B", "C", "C","C"], 'col2': [1,2,3,4,5,6, np.nan, np.nan, np.nan]} df = pd.DataFrame(data=d) df.groupby("col1").sum() This results in the following: col1 col2 A 6.0 B 15.0 C 0.0 I want C to show NaN instead of 0 since all of the values for C are NaN. How can I accomplish this? Apply() with a lambda function? Any help would be appreciated. 回答1: Thanks to @piRSquared, @Alollz, and @anky_91:

pandas groupby apply on multiple columns to generate a new column

阅读更多关于 pandas groupby apply on multiple columns to generate a new column

问题 I like to generate a new column in pandas dataframe using groupby-apply. For example, I have a dataframe: df = pd.DataFrame({'A':[1,2,3,4],'B':['A','B','A','B'],'C':[0,0,1,1]}) and try to generate a new column 'D' by groupby-apply. This works: df = df.assign(D=df.groupby('B').C.apply(lambda x: x - x.mean())) as (I think) it returns a series with the same index with the dataframe: In [4]: df.groupby('B').C.apply(lambda x: x - x.mean()) Out[4]: 0 -0.5 1 -0.5 2 0.5 3 0.5 Name: C, dtype: float64

pandas group by remove outliers

阅读更多关于 pandas group by remove outliers

问题 I want to remove outliers based on percentile 99 values by group wise. import pandas as pd df = pd.DataFrame({'Group': ['A','A','A','B','B','B','B'], 'count': [1.1,11.2,1.1,3.3,3.40,3.3,100.0]}) in output i want to remove 11.2 from group A and 100 from group b. so in final dataset there will only be 5 observations. wantdf = pd.DataFrame({'Group': ['A','A','B','B','B'], 'count': [1.1,1.1,3.3,3.40,3.3]}) I have tried this one but I'm not getting the desired results df[df.groupby("Group")['count