na

How to substitute NA by 0 in 20 columns?

亡梦爱人 提交于 2019-12-01 06:28:52
I want to substitute NA by 0 in 20 columns. I found this approach for 2 columns, however I guess it's not optimal if the number of columns is 20. Is there any alternative and more compact solution? mydata[,c("a", "c")] <- apply(mydata[,c("a","c")], 2, function(x){replace(x, is.na(x), 0)}) UPDATE: For simplicity lets take this data with 8 columns and substitute NAs in columns b, c, e, f and d a b c d e f g d 1 NA NA 2 3 4 7 6 2 g 3 NA 4 5 4 Y 3 r 4 4 NA t 5 5 The result must be this one: a b c d e f g d 1 0 0 2 3 4 7 6 2 g 3 NA 4 5 4 Y 3 r 4 4 0 t 5 5 We can use NAer from qdap to convert the NA

Getting last non na value across rows in a pandas dataframe

柔情痞子 提交于 2019-12-01 06:01:30
I have a dataframe of shape (40,500). Each row in the dataframe has some numerical values till some variable column number k, and all the entries after that are nan. I am trying to get the value of last non-nan column in each row. Is there a way to do this without looping through all the rows of the dataframe? Sample Dataframe: 2016-06-02 7.080 7.079 7.079 7.079 7.079 7.079 nan nan nan 2016-06-08 7.053 7.053 7.053 7.053 7.053 7.054 nan nan nan 2016-06-09 7.061 7.061 7.060 7.060 7.060 7.060 nan nan nan 2016-06-14 nan nan nan nan nan nan nan nan nan 2016-06-15 7.066 7.066 7.066 7.066 nan nan nan

remove columns with NAs from all dataframes in list

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-01 05:16:14
I have a list made up of several data frames. I would like to remove all of the columns with NAs in each data frame. Note the columns to be removed are not the same in each data frame. Sample data provided below. Any suggestions much appreciated. WW1_Data <- structure(list(Alnön = structure(list(Site_Name = structure(1L, .Label = c("Alnön","Ammarnäs", "Anjan", "Bäcksand", "Fittjebodarna", "Flatruet", "Glen", "Idre", "Klångstavallen", "Kramfors", "Ljungdalen", "Ljungris", "Mårdsund", "Mörtsjön", "Nordmaling", "Öster_Galåbodarna", "Ramundberget", "Rätan", "Särvfjället", "Smedstorp", "Söderhamn",

How to substitute NA by 0 in 20 columns?

你离开我真会死。 提交于 2019-12-01 04:48:16
问题 I want to substitute NA by 0 in 20 columns. I found this approach for 2 columns, however I guess it's not optimal if the number of columns is 20. Is there any alternative and more compact solution? mydata[,c("a", "c")] <- apply(mydata[,c("a","c")], 2, function(x){replace(x, is.na(x), 0)}) UPDATE: For simplicity lets take this data with 8 columns and substitute NAs in columns b, c, e, f and d a b c d e f g d 1 NA NA 2 3 4 7 6 2 g 3 NA 4 5 4 Y 3 r 4 4 NA t 5 5 The result must be this one: a b c

Replacing Missing Value in R

时光毁灭记忆、已成空白 提交于 2019-12-01 01:51:48
I have to replace the missing value to maximum (Value) by ID. How to do in R ID Value 1 NA 5 15 8 16 6 8 7 65 8 NA 5 25 1 62 6 14 7 NA 9 11 8 12 9 36 1 26 4 13 I would first precompute the max values using a call to aggregate() , and also precompute which rows of the data.frame have an NA value. Then you can match the IDs into the aggregation table to extract the corresponding max value. maxes <- aggregate(Value~ID,df,max,na.rm=T); nas <- which(is.na(df$Value)); df$Value[nas] <- maxes$Value[match(df$ID[nas],maxes$ID)]; df; ## ID Value ## 1 1 62 ## 2 5 15 ## 3 8 16 ## 4 6 8 ## 5 7 65 ## 6 8 16

Getting boolean pandas column that supports NA/ is nullable

梦想的初衷 提交于 2019-11-30 20:59:14
How can I create a pandas dataframe column with dtype bool (or int for that matter) with support for Nan/missing values? When I try like this: d = {'one' : np.ma.MaskedArray([True, False, True, True], mask = [0,0,1,0]), 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print (df.dtypes) print (df) column one is implicitly converted to object. Likewise similar for ints : d = {'one' : np.ma.MaskedArray([1,3,2,1], mask = [0,0,1,0]), 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print (df.dtypes) print (df) one is here

Replace NA with average of the case before and after the NA

心不动则不痛 提交于 2019-11-30 20:37:37
问题 Say I have the following data.frame: t<-c(1,1,2,4,5,4) u<-c(1,3,4,5,4,2) v<-c(2,3,4,5,NA,2) w<-c(NA,3,4,5,2,3) x<-c(2,3,4,5,6,NA) df<-data.frame(t,u,v,w,x) I would like to replace the NAs with values that represent the average of the case before and after the NA, unless a row starts (row 4) or ends (row 5) with an NA. When the row begins with NA, I would like to substitute the NA with the following case. When the row ends with NA, I would like to substitute the NA with the previous case. Thus

specifying “skip NA” when calculating mean of the column in a data frame created by Pandas

我的梦境 提交于 2019-11-30 20:04:24
I am learning Pandas package by replicating the outing from some of the R vignettes. Now I am using the dplyr package from R as an example: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html R script planes <- group_by(hflights_df, TailNum) delay <- summarise(planes, count = n(), dist = mean(Distance, na.rm = TRUE)) delay <- filter(delay, count > 20, dist < 2000) Python script planes = hflights.groupby('TailNum') planes['Distance'].agg({'count' : 'count', 'dist' : 'mean'}) How can I state explicitly in python that NA needs to be skipped? That's a trick question, since you

Dealing with NAs when calculating mean (summarize_each) on group_by

只谈情不闲聊 提交于 2019-11-30 19:35:22
I have a data frame md: md <- data.frame(x = c(3,5,4,5,3,5), y = c(5,5,5,4,4,1), z = c(1,3,4,3,5,5), device1 = c("c","a","a","b","c","c"), device2 = c("B","A","A","A","B","B")) md[2,3] <- NA md[4,1] <- NA md I want to calculate means by device1 / device2 combinations using dplyr: library(dplyr) md %>% group_by(device1, device2) %>% summarise_each(funs(mean)) However, I am getting some NAs. I want the NAs to be ignored (na.rm = TRUE) - I tried, but the function doesn't want to accept this argument. Both these lines result in error: md %>% group_by(device1, device2) %>% summarise_each(funs(mean)

scale_fill_manual define color for NA values

孤街醉人 提交于 2019-11-30 19:22:28
I try to make a barplot with ggplot2 and am facing some issues with defining the color for NA. ggh <- ggplot(data=dat, aes(x=var1, fill=var2))+ geom_bar(position="dodge")+ scale_fill_manual( values=c("s"="steelblue", "i"="darkgoldenrod2", "r"="firebrick4", na.value="black")) In my var2 I have values c("s", "i", "r", NA) . For some reason my code above inside the scale_fill_manual does not work for NA, even if it works fine for all the others values. Can someone help me figure out why? Thanks for the help The na.value needs to be outside of the values argument. Here is an example: library