pandas-groupby

Grouping odd and even days

怎甘沉沦 提交于 2019-12-24 05:26:27
问题 I have a pandas dataframe as the following data Out[8]: value1 Date 2015-03-31 09:53:53.800 NaN 2015-03-31 10:28:54.700 1.34 2015-03-31 10:34:35.720 NaN 2015-03-31 10:36:53.540 1.26 2015-04-01 11:37:11.620 1.44 2015-04-01 11:39:30.520 NaN 2015-04-01 11:50:25.620 1.76 2015-04-02 11:50:30.620 1.38 2015-04-02 12:31:20.220 1.76 2015-04-02 12:37:43.940 2.36 2015-04-03 12:38:45.820 1.46 2015-04-03 12:41:56.680 2.26 2015-04-04 13:04:50.740 1.16 2015-04-05 12:38:45.820 1.46 2015-04-05 12:41:56.680 2

Finding most common values with Pandas GroupBy and value_counts

隐身守侯 提交于 2019-12-24 04:20:30
问题 I am working with two columns in a table. +-------------+--------------------------------------------------------------+ | Area Name | Code Description | +-------------+--------------------------------------------------------------+ | N Hollywood | VIOLATION OF RESTRAINING ORDER | | N Hollywood | CRIMINAL THREATS - NO WEAPON DISPLAYED | | N Hollywood | CRIMINAL THREATS - NO WEAPON DISPLAYED | | N Hollywood | ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT | | Southeast | ASSAULT WITH DEADLY

Splitting Column Lists in Pandas DataFrame

删除回忆录丶 提交于 2019-12-24 02:14:37
问题 I'm looking for an good way to solve the following problem. My current fix is not particularly clean, and I'm hoping to learn from your insight. Suppose I have a Panda DataFrame, whose entries look like this: >>> df=pd.DataFrame(index=[1,2,3],columns=['Color','Texture','IsGlass']) >>> df['Color']=[np.nan,['Red','Blue'],['Blue', 'Green', 'Purple']] >>> df['Texture']=[['Rough'],np.nan,['Silky', 'Shiny', 'Fuzzy']] >>> df['IsGlass']=[1,0,1] >>> df Color Texture IsGlass 1 NaN ['Rough'] 1 2 ['Red',

Grouping Pandas dataframe across rows

送分小仙女□ 提交于 2019-12-24 00:46:29
问题 I've a csv like this: client1,client2,client3,client4,client5,client6,amount ,,,Comp1,,,4.475000 ,,,Comp2,,,16.305584 ,,,Comp3,,,4.050000 Comp2,Comp1,,Comp4,,,21.000000 ,,,Comp4,,,30.000000 ,Comp1,,Comp2,,,5.137500 ,,,Comp3,,,52.650000 ,,,Comp1,,,2.650000 Comp3,,,Comp3,,,29.000000 Comp5,,,Comp2,,,20.809000 Comp5,,,Comp2,,,15.100000 Comp5,,,Comp2,,,52.404000 After reading it into a pandas dataframe, df, I wanted to aggregate in two steps: Step1: First, I sum the amount: client1 client2 client3

Spliting a dataframe into multiple 5-second dataframes in Python

强颜欢笑 提交于 2019-12-24 00:37:37
问题 I have a relatively big dataset that I want to split into multiple dataframes in Python based on a column containing a datetime object. The values in the column (that I want to split the dataframe by) are given in the following format: 2015-11-01 00:00:05 You may assume the dataframe looks like this. How can I split the dataframe into 5-second intervals in the following way: 1st dataframe 2015-11-01 00:00:00 - 2015-11-01 00:00:05 , 2nd dataframe 2015-11-01 00:00:05 - 2015-11-01 00:00:10 , and

How to do cumsum based on a time condition - resample pandas?

会有一股神秘感。 提交于 2019-12-23 23:20:00
问题 I have a dataframe like as shown below df = pd.DataFrame({ 'subject_id':[1,1,1,1,1,1], 'time_1' :['2173-04-03 10:00:00','2173-04-03 10:15:00','2173-04-03 10:30:00','2173-04-03 10:45:00','2173-04-03 11:05:00','2173- 04-03 11:15:00'], 'val' :[5,6,5,6,6,6] }) I would like to find the total duration of a value appearing in sequence. Below example will help you understand From the above screenshot, you can see that 6 occurs in sequence from 10:45 to 23:59 whereas other values (it could be any

DataError: No numeric types using mean aggregate function but not sum?

末鹿安然 提交于 2019-12-23 19:36:45
问题 I was wondering if someone could help explain the below behaviour using agg() import numpy as np import pandas as pd import string Initialise Data Frame df = pd.DataFrame(data=[list(string.ascii_lowercase)[0:5]*2,list(range(1,11)),list(range(11,21))]).T df.columns = columns=['g','c1','c2'] df.sort_values(['g']).head(5) g c1 c2 0 a 1 11 5 a 6 16 1 b 2 12 6 b 7 17 2 c 3 13 As an example I am summing and averaging across c1 and c2 while doing a group by g No data error scenario: f = { 'c1' :

pandas groupby - custom function

血红的双手。 提交于 2019-12-23 16:51:45
问题 I have the following dataframe to which I use groupby and sum(): d = {'col1': ["A", "A", "A", "B", "B", "B", "C", "C","C"], 'col2': [1,2,3,4,5,6, np.nan, np.nan, np.nan]} df = pd.DataFrame(data=d) df.groupby("col1").sum() This results in the following: col1 col2 A 6.0 B 15.0 C 0.0 I want C to show NaN instead of 0 since all of the values for C are NaN. How can I accomplish this? Apply() with a lambda function? Any help would be appreciated. 回答1: Thanks to @piRSquared, @Alollz, and @anky_91:

pandas groupby apply on multiple columns to generate a new column

北战南征 提交于 2019-12-23 16:22:22
问题 I like to generate a new column in pandas dataframe using groupby-apply. For example, I have a dataframe: df = pd.DataFrame({'A':[1,2,3,4],'B':['A','B','A','B'],'C':[0,0,1,1]}) and try to generate a new column 'D' by groupby-apply. This works: df = df.assign(D=df.groupby('B').C.apply(lambda x: x - x.mean())) as (I think) it returns a series with the same index with the dataframe: In [4]: df.groupby('B').C.apply(lambda x: x - x.mean()) Out[4]: 0 -0.5 1 -0.5 2 0.5 3 0.5 Name: C, dtype: float64

pandas group by remove outliers

三世轮回 提交于 2019-12-23 14:03:45
问题 I want to remove outliers based on percentile 99 values by group wise. import pandas as pd df = pd.DataFrame({'Group': ['A','A','A','B','B','B','B'], 'count': [1.1,11.2,1.1,3.3,3.40,3.3,100.0]}) in output i want to remove 11.2 from group A and 100 from group b. so in final dataset there will only be 5 observations. wantdf = pd.DataFrame({'Group': ['A','A','B','B','B'], 'count': [1.1,1.1,3.3,3.40,3.3]}) I have tried this one but I'm not getting the desired results df[df.groupby("Group")['count