dataframe | 易学教程

Pandas Time Series DataFrame Missing Values

阅读更多关于 Pandas Time Series DataFrame Missing Values

问题 I have a dataset of Total Sales from 2008-2015. I have an entry for each day, and so I have a created a pandas DataFrame with a DatetimeIndex and a column for sales. So it looks like this The problem is that I am missing data for most of 2010. These missing values are currently represented by 0.0 so if I plot the DataFrame I get I want to try forecast values for 2016, possibly using an ARIMA model, so the first step I took was to perform a decomposition of this time series Obviously if I

python how do I perform the below operation in dataframe

阅读更多关于 python how do I perform the below operation in dataframe

问题 df1 = pd.DataFrame({ 'Year': ["1A", "2A", "3A", "4A", "5A"], 'Tval1' : [1, 9, 8, 1, 6], 'Tval2' : [34, 56, 67, 78, 89] }) it looks more like this and I want to change it to make it look like this, the 2nd column is moved under the individual row. 回答1: Idea is get numbers from Year column, then set new columns names after Year column and reshape by DataFrame.stack: df1['Year'] = df1['Year'].str.extract('(\d+)') df = df1.set_index('Year') #add letters by length of columns, working for 1 to 26

Find difference between 2 columns with Nulls using pandas

阅读更多关于 Find difference between 2 columns with Nulls using pandas

问题 I want to find the difference between 2 columns of type int in a pandas DataFrame. I am using python 2.7. The columns are as below - >>> df INVOICED_QUANTITY QUANTITY_SHIPPED 0 15 NaN 1 20 NaN 2 7 NaN 3 7 NaN 4 7 NaN Now, I want to subtract QUANTITY_SHIPPED from INVOICED_QUANTITY & I do the below- >>> df['Diff'] = df['QUANTITY_INVOICED'] - df['SHIPPED_QUANTITY'] >>> df QUANTITY_INVOICED SHIPPED_QUANTITY Diff 0 15 NaN NaN 1 20 NaN NaN 2 7 NaN NaN 3 7 NaN NaN 4 7 NaN NaN How do I take care of

Multiply columns of a dataframe by getting the column names from a list

阅读更多关于 Multiply columns of a dataframe by getting the column names from a list

问题 I have a dataframe in which I have categorical as well as numerical columns. data = [['A',"India",10,20,30,15,"Cochin"],['B',"India",10,20,30,40,"Chennai"],['C',"India",10,20,30,15,"Chennai"]] df = pd.DataFrame(data,columns=['Product','Country',"2016 Total","2017 Total","2018 Total","2019 Total","Region"]) Product Country 2016 Total 2017 Total 2018 Total 2019 Total Region 0 A India 10 20 30 15 Cochin 1 B India 10 20 30 40 Chennai 2 C India 10 20 30 15 Chennai I know what will be the names of

How to simply multiply two columns of a dataframe? [duplicate]

阅读更多关于 How to simply multiply two columns of a dataframe? [duplicate]

问题 This question already has answers here : Efficient multiplication of columns in a data frame (4 answers) Closed 3 years ago . My input is a<-c(1,2,3,4) b<-c(1,2,4,8) df<-data.frame(cbind(a,b)) My output should be a<-c(1,2,3,4) b<-c(1,2,4,8) d<-c(1,4,12,32) df<-data.frame(cbind(a,b,c)) can i simply say df$a * df$b please help. I am getting an issue with duplication. they are getting multiplied in matrix form and there is also issue with different length columns. 回答1: In Base R: df$c <- df$a *

Pandas GroupBy : How to get top n values based on a column

阅读更多关于 Pandas GroupBy : How to get top n values based on a column

问题 forgive me if this is a basic question but i am new to pandas. I have a dataframe with with a column A and i would like to get the top n rows based on the count in Column A. For instance the raw data looks like A B C x 12 ere x 34 bfhg z 6 bgn z 8 rty y 567 hmmu,,u x 545 fghfgj x 44 zxcbv Note that this is just a small sample of the data that i am actually working with. So if we look at Column A, value x appears 4 times,y appears 2 times and z appears 1 time. How can i get the top n values

Define multiple values as missing in a data frame

阅读更多关于 Define multiple values as missing in a data frame

问题 How do I define multiple values as missing in a data frame in R? Consider a data frame where two values, "888" and "999", represent missing data: df <- data.frame(age=c(50,30,27,888),insomnia=c("yes","no","no",999)) df[df==888] <- NA df[df==999] <- NA This solution takes one line of code per value representing missing data. Do you have a more simple solution for situations where the number of values representing missing data is high? 回答1: Here are three solutions: # 1. Data set df <- data

How to combine multiple data frame columns in R

阅读更多关于 How to combine multiple data frame columns in R

问题 I have a .csv file with demographic data for my participants. The data are coded and downloaded from my study database (REDCap) in a way that each race has its own separate column. That is, each participant has a value in each of these columns (1 if endorsed, 0 if unendorsed). It looks something like this: SubjID Sex Age White AA Asian Other 001 F 62 0 1 0 0 002 M 66 1 0 0 0 I have to use a roundabout way to get my demographic summary stats. There's gotta be a simpler way to do this. My

How to combine multiple data frame columns in R

阅读更多关于 How to combine multiple data frame columns in R

Pandas: Divide column data by number if row of next column contains certain value

阅读更多关于 Pandas: Divide column data by number if row of next column contains certain value

问题 I have a dataframe that consists of three columns qty unit_of_measure qty_cal 3 nodes nan 4 nodes nan 5 nodes nan 6 cores nan 7 nodes nan 10 cores nan 3 nodes nan I would like to add a condition to populate qty_cal . The condition is if unit_of_measure is equal to "nodes" populate the row value of qty into qty_cal If it's "cores" divide qty value by 16 and populate qty_cal The code I have tried is, if ppn_df['unit_of_measure'] == 'Nodes': ppn_df['qty'] elif ppn_df['unit_of_measure'] =='Cores'