dataframe

Pandas: Divide column data by number if row of next column contains certain value

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-09 11:08:05
问题 I have a dataframe that consists of three columns qty unit_of_measure qty_cal 3 nodes nan 4 nodes nan 5 nodes nan 6 cores nan 7 nodes nan 10 cores nan 3 nodes nan I would like to add a condition to populate qty_cal . The condition is if unit_of_measure is equal to "nodes" populate the row value of qty into qty_cal If it's "cores" divide qty value by 16 and populate qty_cal The code I have tried is, if ppn_df['unit_of_measure'] == 'Nodes': ppn_df['qty'] elif ppn_df['unit_of_measure'] =='Cores'

Groupby and resample timeseries so date ranges are consistent

萝らか妹 提交于 2021-02-09 10:55:23
问题 I have a dataframe which is basically several timeseries stacked on top of one another. Each time series has a unique label (group) and they have different date ranges. date = pd.to_datetime(pd.Series(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-06', '2010-01-01', '2010-01-03'])) group = [1,1,1,1, 2, 2] value = [1,2,3,4,5,6] df = pd.DataFrame({'date':date, 'group':group, 'value':value}) df date group value 0 2010-01-01 1 1 1 2010-01-02 1 2 2 2010-01-03 1 3 3 2010-01-06 1 4 4 2010-01-01

Groupby and resample timeseries so date ranges are consistent

岁酱吖の 提交于 2021-02-09 10:55:20
问题 I have a dataframe which is basically several timeseries stacked on top of one another. Each time series has a unique label (group) and they have different date ranges. date = pd.to_datetime(pd.Series(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-06', '2010-01-01', '2010-01-03'])) group = [1,1,1,1, 2, 2] value = [1,2,3,4,5,6] df = pd.DataFrame({'date':date, 'group':group, 'value':value}) df date group value 0 2010-01-01 1 1 1 2010-01-02 1 2 2 2010-01-03 1 3 3 2010-01-06 1 4 4 2010-01-01

pandas: Select dataframe columns based on another dataframe's columns

蓝咒 提交于 2021-02-08 21:22:53
问题 I'm trying to subset a pandas dataframe based on columns in another, similar dataframe. I can do this easily in R: df1 <- data.frame(A=1:5, B=6:10, C=11:15) df2 <- data.frame(A=1:5, B=6:10) #Select columns in df1 that exist in df2 df1[df1 %in% df2] A B 1 1 6 2 2 7 3 3 8 4 4 9 5 5 10 #Select columns in df1 that do not exist in df2 df1[!(df1 %in% df2)] C 1 11 2 12 3 13 4 14 5 15 How can I do that with the pandas dataframes below? df1 = pd.DataFrame({'A': [1,2,3,4,5],'B': [6,7,8,9,10],'C': [11

Python pandas dataframe fill NaN with other Series

岁酱吖の 提交于 2021-02-08 16:59:22
问题 I want to fill NaN values in a DataFrame (df) column (var4) based on a control table (fillna_mean) using column mean, and var1 as index.In the dataframe I want them to match on var1. I have tried doing this with fillna but I dont get it to work all the way. How do I do this in a smart way, using df.var1 as index matching fillna_mean.var1? df: df = pd.DataFrame({'var1' : list('a' * 3) + list('b' * 2) + list('c' * 4) + list('d' * 3) ,'var2' : [i for i in range(12)] ,'var3' : list(np.random

match / find rows based on multiple required values in a single row in R

非 Y 不嫁゛ 提交于 2021-02-08 15:24:09
问题 This must be a duplicate but I can't find it. So here goes. I have a data.frame with two columns. One contains a group and the other contains a criterion. A group can contain many different criteria, but only one per row. I want to identify groups that contain three specific criteria (but that will appear on different rows. In my case, I want to identify all groups that contains the criteria "I","E","C". Groups may contain any number and combination of these and several other letters. test <-

How to sort a column with Date and time values in Spark?

假如想象 提交于 2021-02-08 15:12:08
问题 Note: I have this as a Dataframe in spark. This Time/Date values constitute a single column in the Dataframe. Input: 04-NOV-16 03.36.13.000000000 PM 06-NOV-15 03.42.21.000000000 PM 05-NOV-15 03.32.05.000000000 PM 06-NOV-15 03.32.14.000000000 AM Expected Output: 05-NOV-15 03.32.05.000000000 PM 06-NOV-15 03.32.14.000000000 AM 06-NOV-15 03.42.21.000000000 PM 04-NOV-16 03.36.13.000000000 PM 回答1: As this format is not standard, you need to use the unix_timestamp function to parse the string and

Pandas Split column into multiple columns by multiple string delimiters

霸气de小男生 提交于 2021-02-08 15:00:27
问题 I have a dataframe: id info 1 Name: John Age: 12 Sex: Male 2 Name: Sara Age: 22 Sex: Female 3 Name: Mac Donald Age: 32 Sex: Male I'm looking to split the info column into 3 columns such that i get the final output as: id Name Age Sex 1 John 12 Male 2 Sara 22 Female 3 Mac Donald 32 Male I tried using pandas split function. df[['Name','Age','Sex']] = df.info.split(['Name']) I might have to do this multiple times to get desired output. Is there a better way to achieve this? PS: The info column

Create a dataframe from arrays python

社会主义新天地 提交于 2021-02-08 14:16:09
问题 I'm try to construct a dataframe (I'm using Pandas library) from some arrays and one matrix. in particular, if I have two array like this: A=[A,B,C] B=[D,E,F] And one matrix like this : 1 2 2 3 3 3 4 4 4 Can i create a dataset like this? A B C D 1 2 2 E 3 3 3 F 4 4 4 Maybe is a stupid question, but i m very new with Python and Pandas. I seen this : https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.html but specify only 'colums'. I should read the matrix row for

Create a dataframe from arrays python

你。 提交于 2021-02-08 14:14:50
问题 I'm try to construct a dataframe (I'm using Pandas library) from some arrays and one matrix. in particular, if I have two array like this: A=[A,B,C] B=[D,E,F] And one matrix like this : 1 2 2 3 3 3 4 4 4 Can i create a dataset like this? A B C D 1 2 2 E 3 3 3 F 4 4 4 Maybe is a stupid question, but i m very new with Python and Pandas. I seen this : https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.html but specify only 'colums'. I should read the matrix row for