pandas-apply

Panadas Condition on Dataframe returns TypeError: '>' not supported between instances of 'str' and 'int'

旧时模样 提交于 2021-02-05 06:40:24
问题 I'm working on a DataFrame using pandas and I need to add a new column based on some conditions. My DataFrame is: discount tax total subtotal productid 3 0 20 13 002 10 3 106 94 003 46.49 6 21 20 004 I need to apply some conditions while adding a new column named as Class to the DataFrame. Conditions are as follows: IF discount > 20 & total > 100 & tax == 0 then Class should be 1 otherwise it should be 0 Here's how I have tried: def conditions(s): if (s['discount'] > 20) and (s['tax'] == 0)

pandas - iterate over rows and calculate - faster

非 Y 不嫁゛ 提交于 2020-01-06 07:24:11
问题 I already have a solution -but it is very slow (13 minutes for 800 rows). here is an example of the dataframe: import pandas as pd d = {'col1': [20,23,40,41,48,49,50,50], 'col2': [39,32,42,50,63,68,68,69]} df = pd.DataFrame(data=d) df In a new column, I want to calculate how many of the previous values (for example three)of col2 are greater or equal than row-value of col1. i also continue the first rows. this is my slow code: start_at_nr = 3 #variable in which row start to calculate df[

pandas - iterate over rows and calculate - faster

喜你入骨 提交于 2020-01-06 07:24:07
问题 I already have a solution -but it is very slow (13 minutes for 800 rows). here is an example of the dataframe: import pandas as pd d = {'col1': [20,23,40,41,48,49,50,50], 'col2': [39,32,42,50,63,68,68,69]} df = pd.DataFrame(data=d) df In a new column, I want to calculate how many of the previous values (for example three)of col2 are greater or equal than row-value of col1. i also continue the first rows. this is my slow code: start_at_nr = 3 #variable in which row start to calculate df[

Apply function to create string with multiple columns as argument

蓝咒 提交于 2019-12-24 00:38:50
问题 I have a dataframe like this: name . size . type . av_size_type 0 John . 23 . Qapra' . 22 1 Dan . 21 . nuk'neH . 12 2 Monica . 12 . kahless . 15 I want to create a new column with a sentence, like this: name . size . type . av_size_type . sentence 0 John . 23 . Qapra' . 22 . "John has size 23, above the average of Qapra' type (22)" 1 Dan . 21 . nuk'neH . 12 . "Dan has size 21, above the average of nuk'neH type (21)" 2 Monica . 12 . kahless . 15 . "Monica has size 12l, above the average of

pandas groupby apply on multiple columns to generate a new column

北战南征 提交于 2019-12-23 16:22:22
问题 I like to generate a new column in pandas dataframe using groupby-apply. For example, I have a dataframe: df = pd.DataFrame({'A':[1,2,3,4],'B':['A','B','A','B'],'C':[0,0,1,1]}) and try to generate a new column 'D' by groupby-apply. This works: df = df.assign(D=df.groupby('B').C.apply(lambda x: x - x.mean())) as (I think) it returns a series with the same index with the dataframe: In [4]: df.groupby('B').C.apply(lambda x: x - x.mean()) Out[4]: 0 -0.5 1 -0.5 2 0.5 3 0.5 Name: C, dtype: float64

Groupby on columns with overlapping groups

强颜欢笑 提交于 2019-12-11 16:12:37
问题 Continuing from my previous question. This produces a dafatrame with 81 columns and filled with random numbers: import pandas as pd import itertools import numpy as np col = "A,B,C".split(',') col1 = "1,2,3,4,5,6,7,8,9".split(',') col2 = "E,F,G".split(',') all_dims = [col, col1, col2] all_keys = ['.'.join(i) for i in itertools.product(*all_dims)] rng = pd.date_range(end=pd.Timestamp.today().date(), periods=12, freq='M') df = pd.DataFrame(np.random.randint(0, 1000, size=(len(rng), len(all_keys