dataframe

Replace values based on multiple conditions with groupby mean in Pandas

此生再无相见时 提交于 2021-02-11 09:38:54
问题 Say I have a dataframe as follows: df = pd.DataFrame({'date': pd.date_range(start='2013-01-01', periods=6, freq='M'), 'value': [3, 3.5, -5, 2, 7, 6.8], 'type': ['a', 'a', 'a', 'b', 'b', 'b']}) df['pct'] = df.groupby(['type'])['value'].pct_change() Ouput: date value type pct 0 2013-01-31 3.0 a NaN 1 2013-02-28 3.5 a 0.166667 2 2013-03-31 -5.0 a -2.428571 3 2013-04-30 2.0 b NaN 4 2013-05-31 7.0 b 2.500000 5 2013-06-30 6.8 b -0.028571 I want to replace the pct values which is bigger than 0.2 or

Replace values based on multiple conditions with groupby mean in Pandas

孤人 提交于 2021-02-11 09:38:27
问题 Say I have a dataframe as follows: df = pd.DataFrame({'date': pd.date_range(start='2013-01-01', periods=6, freq='M'), 'value': [3, 3.5, -5, 2, 7, 6.8], 'type': ['a', 'a', 'a', 'b', 'b', 'b']}) df['pct'] = df.groupby(['type'])['value'].pct_change() Ouput: date value type pct 0 2013-01-31 3.0 a NaN 1 2013-02-28 3.5 a 0.166667 2 2013-03-31 -5.0 a -2.428571 3 2013-04-30 2.0 b NaN 4 2013-05-31 7.0 b 2.500000 5 2013-06-30 6.8 b -0.028571 I want to replace the pct values which is bigger than 0.2 or

How to create a pandas DataFrame column based on the existence of values in a subset of columns, by row?

风流意气都作罢 提交于 2021-02-11 08:47:26
问题 I have a pandas DataFrame as follows: import pandas as pd data1 = {"column1": ["A", "B", "C", "D", "E", "F", "G"], "column2": [338, 519, 871, 1731, 2693, 2963, 3379], "column3": [5, 1, 8, 3, 731, 189, 9], "columnA" : [5, 0, 75, 150, 0, 0, 0], "columnB" : [0, 32, 0, 96, 0, 51, 0], "columnC" : [0, 42, 0, 42, 0, 42, 42]} df = pd.DataFrame(data1) df >>> column1 column2 column3 columnA columnB columnC 0 A 338 5 5 0 0 1 B 519 1 0 32 42 2 C 871 8 75 0 0 3 D 1731 3 150 96 42 4 E 2693 731 0 0 0 5 F

use elements in a list for dataframe names

你离开我真会死。 提交于 2021-02-11 08:37:38
问题 I have a list like this: network = ['facebook','organic',instagram'] And I create 3 dataframes: facebook_count, organic_count, instagram_count that each type of network. facebook_count = df[df.isin(['facebook installs'])] organic_count = df[df.isin(['organic installs'])] instagram_count = df[df.isin(['instagram installs'])] so is there a way that write a iteration that create these 3 dataframes at once? I write something like this: for i in range(len(network)+1): network[i]+'_count' = df[df

use elements in a list for dataframe names

我怕爱的太早我们不能终老 提交于 2021-02-11 08:37:10
问题 I have a list like this: network = ['facebook','organic',instagram'] And I create 3 dataframes: facebook_count, organic_count, instagram_count that each type of network. facebook_count = df[df.isin(['facebook installs'])] organic_count = df[df.isin(['organic installs'])] instagram_count = df[df.isin(['instagram installs'])] so is there a way that write a iteration that create these 3 dataframes at once? I write something like this: for i in range(len(network)+1): network[i]+'_count' = df[df

How to load data in weka Instances from a spark dataframe

那年仲夏 提交于 2021-02-11 08:27:20
问题 I have a spark DataFrame. Now I want to do some processing using Weka. Therefore, I want to load data into Weka Instances from the DataFrame and finally return the data as a DataFrame. As the structure both the data type is different, I wondering can anybody help me with the conversion. The code snippet may look like below. val df: DataFrame = data val data: Instances = process(df) 来源: https://stackoverflow.com/questions/58160584/how-to-load-data-in-weka-instances-from-a-spark-dataframe

Check multiple columns data format and append results to one column in Pandas

风流意气都作罢 提交于 2021-02-11 07:14:10
问题 Given a toy dataset as follows: id room area situation 0 1 A-102 world under construction 1 2 NaN 24 under construction 2 3 B309 NaN NaN 3 4 C·102 25 under decoration 4 5 E_1089 hello under decoration 5 6 27 NaN under plan 6 7 27 NaN NaN I need to check three columns: room, area, situation based on the following conditions: (1) if room name is not number, alphabet, - ( NaN s are also considered as invalid one), then returns incorrect room name for check column; (2) if area is not number or

Difference between two dates in Pandas DataFrame

℡╲_俬逩灬. 提交于 2021-02-11 07:10:57
问题 I have many columns in a data frame and I have to find the difference of time in two column named as in_time and out_time and put it in the new column in the same data frame. The format of time is like this 2015-09-25T01:45:34.372Z . I am using Pandas DataFrame. I want to do like this: df.days = df.out_time - df.in_time I have many columns and I have to increase 1 more column in it named days and put the differences there. 回答1: You need to convert the strings to datetime dtype, you can then

Difference between two dates in Pandas DataFrame

空扰寡人 提交于 2021-02-11 07:09:59
问题 I have many columns in a data frame and I have to find the difference of time in two column named as in_time and out_time and put it in the new column in the same data frame. The format of time is like this 2015-09-25T01:45:34.372Z . I am using Pandas DataFrame. I want to do like this: df.days = df.out_time - df.in_time I have many columns and I have to increase 1 more column in it named days and put the differences there. 回答1: You need to convert the strings to datetime dtype, you can then

Check multiple columns data format and append results to one column in Pandas

只愿长相守 提交于 2021-02-11 07:09:51
问题 Given a toy dataset as follows: id room area situation 0 1 A-102 world under construction 1 2 NaN 24 under construction 2 3 B309 NaN NaN 3 4 C·102 25 under decoration 4 5 E_1089 hello under decoration 5 6 27 NaN under plan 6 7 27 NaN NaN I need to check three columns: room, area, situation based on the following conditions: (1) if room name is not number, alphabet, - ( NaN s are also considered as invalid one), then returns incorrect room name for check column; (2) if area is not number or