dataframe

How to filter dataframe with multiple conditions?

我与影子孤独终老i 提交于 2021-02-04 05:57:07
问题 I have this dataframe that I'll like to subset (if possible, with dplyr or base R functions): df <- data.frame(x = c(1,1,1,2,2,2), y = c(30,10,8,10,18,5)) x y 1 30 1 10 1 8 2 10 2 18 2 5 Assuming x are factors (so 2 conditions/levels), how can I subset/filter this dataframe so that I get only df$y values that are greater than 15 for df$x == 1 , and df$y values that are greater than 5 for df$x == 2 ? This is what I'd like to get: df2 <- data.frame(x = c(1,2,2), y = c(30,10,18)) x y 1 30 2 10 2

How to filter dataframe with multiple conditions?

冷暖自知 提交于 2021-02-04 05:56:08
问题 I have this dataframe that I'll like to subset (if possible, with dplyr or base R functions): df <- data.frame(x = c(1,1,1,2,2,2), y = c(30,10,8,10,18,5)) x y 1 30 1 10 1 8 2 10 2 18 2 5 Assuming x are factors (so 2 conditions/levels), how can I subset/filter this dataframe so that I get only df$y values that are greater than 15 for df$x == 1 , and df$y values that are greater than 5 for df$x == 2 ? This is what I'd like to get: df2 <- data.frame(x = c(1,2,2), y = c(30,10,18)) x y 1 30 2 10 2

Replacing the missing values in pandas

送分小仙女□ 提交于 2021-02-02 09:59:38
问题 I have a pandas dataframe where missing values are indicated as -999. In [58]: df.head() Out[58]: EventId A B C 100000 0.91 124.711 2.666000 100001 -999.00 -999.000 -0.202838 100002 -999.00 -999.000 -0.202838 100003 -999.00 -999.000 -0.202838 I want to replace the missing values (indicated by -999) with the mean of that column taken over non-missing values. Which is the best way to do this? Is there any pandas function which can be used to achieve this easily? 回答1: df2.replace(-999, np.nan,

Replacing the missing values in pandas

落花浮王杯 提交于 2021-02-02 09:58:49
问题 I have a pandas dataframe where missing values are indicated as -999. In [58]: df.head() Out[58]: EventId A B C 100000 0.91 124.711 2.666000 100001 -999.00 -999.000 -0.202838 100002 -999.00 -999.000 -0.202838 100003 -999.00 -999.000 -0.202838 I want to replace the missing values (indicated by -999) with the mean of that column taken over non-missing values. Which is the best way to do this? Is there any pandas function which can be used to achieve this easily? 回答1: df2.replace(-999, np.nan,

Multiple logical comparisons in pandas df

。_饼干妹妹 提交于 2021-02-02 09:28:28
问题 If I have the following pandas df A B C D 1 2 3 4 2 2 3 4 and I want to add a new column to be 1, 2 or 3 depending on, (A > B) && (B > C) = 1 (A < B) && (B < C) = 2 Else = 3 whats the best way to do this? 回答1: You can use numpy.select to structure your multiple conditions. The final parameter represents default value. conditions = [(df.A > df.B) & (df.B > df.C), (df.A < df.B) & (df.B < df.C)] values = [1, 2] df['E'] = np.select(conditions, values, 3) There are several alternatives: nested

R ~ Vectorization of a user defined function

混江龙づ霸主 提交于 2021-02-02 09:23:45
问题 I need to write a function that will count the number of working days (minus weekends, and a vector of other local bank holidays), but the problem I'm coming up against is more simply illustrated with just counting the number of weekdays. Here is a function that will give the number of weekdays between two dates: removeWeekends <- function(end, start){ range <- as.Date(start:end, "1970-01-01") range<- range[sapply(range, function(x){ if(!chron::is.weekend(x)){ return(TRUE) }else{ return(FALSE

Change values in a column from a list

柔情痞子 提交于 2021-01-30 09:08:50
问题 I've got a dataframe with my index 'Country' I want to change the name of multiple countries, I have the old/new values in a dictionary, like below: I tried splitting the values in a from list and to list, and that wouldn't work either. The code doesn't error, but the values in my dataframe haven't changed. `import pandas as pd import numpy as np energy = (pd.read_excel('Energy Indicators.xls', skiprows=17, skip_footer=38)) energy = (energy.drop(energy.columns[[0, 1]], axis=1)) energy.columns

Change values in a column from a list

旧街凉风 提交于 2021-01-30 09:05:32
问题 I've got a dataframe with my index 'Country' I want to change the name of multiple countries, I have the old/new values in a dictionary, like below: I tried splitting the values in a from list and to list, and that wouldn't work either. The code doesn't error, but the values in my dataframe haven't changed. `import pandas as pd import numpy as np energy = (pd.read_excel('Energy Indicators.xls', skiprows=17, skip_footer=38)) energy = (energy.drop(energy.columns[[0, 1]], axis=1)) energy.columns

pandas dataframe- how to find words that repeat in each row [closed]

China☆狼群 提交于 2021-01-29 22:08:26
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 4 months ago . Improve this question I have a dataframe with a text column containing long string values. The text has been cleaned and has only words as shown in the example below. text ===== This is the first row This is the second row third row this is the I would like to get this content:

Assign event number based on Date of occurece in R dataframe

℡╲_俬逩灬. 提交于 2021-01-29 19:50:40
问题 How to assign an event number based on their date of occurrence satisfying the following conditions. If the event occurs for at least 3 consecutive days ( or more ) assign event number e1 and so on and mutate (join) with the original data frame. If the occurrence is not for continuous 3 days, assign NA and mutate with the original data frame. In time series dts how can I achieve it. The output data frame would be like dts_output (done manually). dts<-structure(list(Date = structure(c(16442,