dataframe | 易学教程

Replace values based on multiple conditions with groupby mean in Pandas

阅读更多关于 Replace values based on multiple conditions with groupby mean in Pandas

问题 Say I have a dataframe as follows: df = pd.DataFrame({'date': pd.date_range(start='2013-01-01', periods=6, freq='M'), 'value': [3, 3.5, -5, 2, 7, 6.8], 'type': ['a', 'a', 'a', 'b', 'b', 'b']}) df['pct'] = df.groupby(['type'])['value'].pct_change() Ouput: date value type pct 0 2013-01-31 3.0 a NaN 1 2013-02-28 3.5 a 0.166667 2 2013-03-31 -5.0 a -2.428571 3 2013-04-30 2.0 b NaN 4 2013-05-31 7.0 b 2.500000 5 2013-06-30 6.8 b -0.028571 I want to replace the pct values which is bigger than 0.2 or

Replace values based on multiple conditions with groupby mean in Pandas

阅读更多关于 Replace values based on multiple conditions with groupby mean in Pandas

How to create a pandas DataFrame column based on the existence of values in a subset of columns, by row?

阅读更多关于 How to create a pandas DataFrame column based on the existence of values in a subset of columns, by row?

问题 I have a pandas DataFrame as follows: import pandas as pd data1 = {"column1": ["A", "B", "C", "D", "E", "F", "G"], "column2": [338, 519, 871, 1731, 2693, 2963, 3379], "column3": [5, 1, 8, 3, 731, 189, 9], "columnA" : [5, 0, 75, 150, 0, 0, 0], "columnB" : [0, 32, 0, 96, 0, 51, 0], "columnC" : [0, 42, 0, 42, 0, 42, 42]} df = pd.DataFrame(data1) df >>> column1 column2 column3 columnA columnB columnC 0 A 338 5 5 0 0 1 B 519 1 0 32 42 2 C 871 8 75 0 0 3 D 1731 3 150 96 42 4 E 2693 731 0 0 0 5 F

use elements in a list for dataframe names

阅读更多关于 use elements in a list for dataframe names

问题 I have a list like this: network = ['facebook','organic',instagram'] And I create 3 dataframes: facebook_count, organic_count, instagram_count that each type of network. facebook_count = df[df.isin(['facebook installs'])] organic_count = df[df.isin(['organic installs'])] instagram_count = df[df.isin(['instagram installs'])] so is there a way that write a iteration that create these 3 dataframes at once? I write something like this: for i in range(len(network)+1): network[i]+'_count' = df[df

use elements in a list for dataframe names

阅读更多关于 use elements in a list for dataframe names

How to load data in weka Instances from a spark dataframe

阅读更多关于 How to load data in weka Instances from a spark dataframe

问题 I have a spark DataFrame. Now I want to do some processing using Weka. Therefore, I want to load data into Weka Instances from the DataFrame and finally return the data as a DataFrame. As the structure both the data type is different, I wondering can anybody help me with the conversion. The code snippet may look like below. val df: DataFrame = data val data: Instances = process(df) 来源： https://stackoverflow.com/questions/58160584/how-to-load-data-in-weka-instances-from-a-spark-dataframe

Check multiple columns data format and append results to one column in Pandas

阅读更多关于 Check multiple columns data format and append results to one column in Pandas

问题 Given a toy dataset as follows: id room area situation 0 1 A-102 world under construction 1 2 NaN 24 under construction 2 3 B309 NaN NaN 3 4 C·102 25 under decoration 4 5 E_1089 hello under decoration 5 6 27 NaN under plan 6 7 27 NaN NaN I need to check three columns: room, area, situation based on the following conditions: (1) if room name is not number, alphabet, - ( NaN s are also considered as invalid one), then returns incorrect room name for check column; (2) if area is not number or

Difference between two dates in Pandas DataFrame

阅读更多关于 Difference between two dates in Pandas DataFrame

问题 I have many columns in a data frame and I have to find the difference of time in two column named as in_time and out_time and put it in the new column in the same data frame. The format of time is like this 2015-09-25T01:45:34.372Z . I am using Pandas DataFrame. I want to do like this: df.days = df.out_time - df.in_time I have many columns and I have to increase 1 more column in it named days and put the differences there. 回答1: You need to convert the strings to datetime dtype, you can then

Difference between two dates in Pandas DataFrame

阅读更多关于 Difference between two dates in Pandas DataFrame

Check multiple columns data format and append results to one column in Pandas

阅读更多关于 Check multiple columns data format and append results to one column in Pandas