dataframe | 易学教程

How to filter dataframe with multiple conditions?

阅读更多关于 How to filter dataframe with multiple conditions?

问题 I have this dataframe that I'll like to subset (if possible, with dplyr or base R functions): df <- data.frame(x = c(1,1,1,2,2,2), y = c(30,10,8,10,18,5)) x y 1 30 1 10 1 8 2 10 2 18 2 5 Assuming x are factors (so 2 conditions/levels), how can I subset/filter this dataframe so that I get only df$y values that are greater than 15 for df$x == 1 , and df$y values that are greater than 5 for df$x == 2 ? This is what I'd like to get: df2 <- data.frame(x = c(1,2,2), y = c(30,10,18)) x y 1 30 2 10 2

How to filter dataframe with multiple conditions?

阅读更多关于 How to filter dataframe with multiple conditions?

Replacing the missing values in pandas

阅读更多关于 Replacing the missing values in pandas

问题 I have a pandas dataframe where missing values are indicated as -999. In [58]: df.head() Out[58]: EventId A B C 100000 0.91 124.711 2.666000 100001 -999.00 -999.000 -0.202838 100002 -999.00 -999.000 -0.202838 100003 -999.00 -999.000 -0.202838 I want to replace the missing values (indicated by -999) with the mean of that column taken over non-missing values. Which is the best way to do this? Is there any pandas function which can be used to achieve this easily? 回答1: df2.replace(-999, np.nan,

Replacing the missing values in pandas

阅读更多关于 Replacing the missing values in pandas

Multiple logical comparisons in pandas df

阅读更多关于 Multiple logical comparisons in pandas df

问题 If I have the following pandas df A B C D 1 2 3 4 2 2 3 4 and I want to add a new column to be 1, 2 or 3 depending on, (A > B) && (B > C) = 1 (A < B) && (B < C) = 2 Else = 3 whats the best way to do this? 回答1: You can use numpy.select to structure your multiple conditions. The final parameter represents default value. conditions = [(df.A > df.B) & (df.B > df.C), (df.A < df.B) & (df.B < df.C)] values = [1, 2] df['E'] = np.select(conditions, values, 3) There are several alternatives: nested

R ~ Vectorization of a user defined function

阅读更多关于 R ~ Vectorization of a user defined function

问题 I need to write a function that will count the number of working days (minus weekends, and a vector of other local bank holidays), but the problem I'm coming up against is more simply illustrated with just counting the number of weekdays. Here is a function that will give the number of weekdays between two dates: removeWeekends <- function(end, start){ range <- as.Date(start:end, "1970-01-01") range<- range[sapply(range, function(x){ if(!chron::is.weekend(x)){ return(TRUE) }else{ return(FALSE

Change values in a column from a list

阅读更多关于 Change values in a column from a list

问题 I've got a dataframe with my index 'Country' I want to change the name of multiple countries, I have the old/new values in a dictionary, like below: I tried splitting the values in a from list and to list, and that wouldn't work either. The code doesn't error, but the values in my dataframe haven't changed. `import pandas as pd import numpy as np energy = (pd.read_excel('Energy Indicators.xls', skiprows=17, skip_footer=38)) energy = (energy.drop(energy.columns[[0, 1]], axis=1)) energy.columns

Change values in a column from a list

阅读更多关于 Change values in a column from a list

pandas dataframe- how to find words that repeat in each row [closed]

阅读更多关于 pandas dataframe- how to find words that repeat in each row [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 months ago . Improve this question I have a dataframe with a text column containing long string values. The text has been cleaned and has only words as shown in the example below. text ===== This is the first row This is the second row third row this is the I would like to get this content:

Assign event number based on Date of occurece in R dataframe

阅读更多关于 Assign event number based on Date of occurece in R dataframe

问题 How to assign an event number based on their date of occurrence satisfying the following conditions. If the event occurs for at least 3 consecutive days ( or more ) assign event number e1 and so on and mutate (join) with the original data frame. If the occurrence is not for continuous 3 days, assign NA and mutate with the original data frame. In time series dts how can I achieve it. The output data frame would be like dts_output (done manually). dts<-structure(list(Date = structure(c(16442,