na | 易学教程

How to substitute NA by 0 in 20 columns?

阅读更多关于 How to substitute NA by 0 in 20 columns?

I want to substitute NA by 0 in 20 columns. I found this approach for 2 columns, however I guess it's not optimal if the number of columns is 20. Is there any alternative and more compact solution? mydata[,c("a", "c")] <- apply(mydata[,c("a","c")], 2, function(x){replace(x, is.na(x), 0)}) UPDATE: For simplicity lets take this data with 8 columns and substitute NAs in columns b, c, e, f and d a b c d e f g d 1 NA NA 2 3 4 7 6 2 g 3 NA 4 5 4 Y 3 r 4 4 NA t 5 5 The result must be this one: a b c d e f g d 1 0 0 2 3 4 7 6 2 g 3 NA 4 5 4 Y 3 r 4 4 0 t 5 5 We can use NAer from qdap to convert the NA

Getting last non na value across rows in a pandas dataframe

阅读更多关于 Getting last non na value across rows in a pandas dataframe

I have a dataframe of shape (40,500). Each row in the dataframe has some numerical values till some variable column number k, and all the entries after that are nan. I am trying to get the value of last non-nan column in each row. Is there a way to do this without looping through all the rows of the dataframe? Sample Dataframe: 2016-06-02 7.080 7.079 7.079 7.079 7.079 7.079 nan nan nan 2016-06-08 7.053 7.053 7.053 7.053 7.053 7.054 nan nan nan 2016-06-09 7.061 7.061 7.060 7.060 7.060 7.060 nan nan nan 2016-06-14 nan nan nan nan nan nan nan nan nan 2016-06-15 7.066 7.066 7.066 7.066 nan nan nan

remove columns with NAs from all dataframes in list

阅读更多关于 remove columns with NAs from all dataframes in list

I have a list made up of several data frames. I would like to remove all of the columns with NAs in each data frame. Note the columns to be removed are not the same in each data frame. Sample data provided below. Any suggestions much appreciated. WW1_Data <- structure(list(Alnön = structure(list(Site_Name = structure(1L, .Label = c("Alnön","Ammarnäs", "Anjan", "Bäcksand", "Fittjebodarna", "Flatruet", "Glen", "Idre", "Klångstavallen", "Kramfors", "Ljungdalen", "Ljungris", "Mårdsund", "Mörtsjön", "Nordmaling", "Öster_Galåbodarna", "Ramundberget", "Rätan", "Särvfjället", "Smedstorp", "Söderhamn",

How to substitute NA by 0 in 20 columns?

阅读更多关于 How to substitute NA by 0 in 20 columns?

问题 I want to substitute NA by 0 in 20 columns. I found this approach for 2 columns, however I guess it's not optimal if the number of columns is 20. Is there any alternative and more compact solution? mydata[,c("a", "c")] <- apply(mydata[,c("a","c")], 2, function(x){replace(x, is.na(x), 0)}) UPDATE: For simplicity lets take this data with 8 columns and substitute NAs in columns b, c, e, f and d a b c d e f g d 1 NA NA 2 3 4 7 6 2 g 3 NA 4 5 4 Y 3 r 4 4 NA t 5 5 The result must be this one: a b c

Replacing Missing Value in R

阅读更多关于 Replacing Missing Value in R

I have to replace the missing value to maximum (Value) by ID. How to do in R ID Value 1 NA 5 15 8 16 6 8 7 65 8 NA 5 25 1 62 6 14 7 NA 9 11 8 12 9 36 1 26 4 13 I would first precompute the max values using a call to aggregate() , and also precompute which rows of the data.frame have an NA value. Then you can match the IDs into the aggregation table to extract the corresponding max value. maxes <- aggregate(Value~ID,df,max,na.rm=T); nas <- which(is.na(df$Value)); df$Value[nas] <- maxes$Value[match(df$ID[nas],maxes$ID)]; df; ## ID Value ## 1 1 62 ## 2 5 15 ## 3 8 16 ## 4 6 8 ## 5 7 65 ## 6 8 16

Getting boolean pandas column that supports NA/ is nullable

阅读更多关于 Getting boolean pandas column that supports NA/ is nullable

How can I create a pandas dataframe column with dtype bool (or int for that matter) with support for Nan/missing values? When I try like this: d = {'one' : np.ma.MaskedArray([True, False, True, True], mask = [0,0,1,0]), 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print (df.dtypes) print (df) column one is implicitly converted to object. Likewise similar for ints : d = {'one' : np.ma.MaskedArray([1,3,2,1], mask = [0,0,1,0]), 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print (df.dtypes) print (df) one is here

Replace NA with average of the case before and after the NA

阅读更多关于 Replace NA with average of the case before and after the NA

问题 Say I have the following data.frame: t<-c(1,1,2,4,5,4) u<-c(1,3,4,5,4,2) v<-c(2,3,4,5,NA,2) w<-c(NA,3,4,5,2,3) x<-c(2,3,4,5,6,NA) df<-data.frame(t,u,v,w,x) I would like to replace the NAs with values that represent the average of the case before and after the NA, unless a row starts (row 4) or ends (row 5) with an NA. When the row begins with NA, I would like to substitute the NA with the following case. When the row ends with NA, I would like to substitute the NA with the previous case. Thus

specifying “skip NA” when calculating mean of the column in a data frame created by Pandas

阅读更多关于 specifying “skip NA” when calculating mean of the column in a data frame created by Pandas

I am learning Pandas package by replicating the outing from some of the R vignettes. Now I am using the dplyr package from R as an example: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html R script planes <- group_by(hflights_df, TailNum) delay <- summarise(planes, count = n(), dist = mean(Distance, na.rm = TRUE)) delay <- filter(delay, count > 20, dist < 2000) Python script planes = hflights.groupby('TailNum') planes['Distance'].agg({'count' : 'count', 'dist' : 'mean'}) How can I state explicitly in python that NA needs to be skipped? That's a trick question, since you

Dealing with NAs when calculating mean (summarize_each) on group_by

阅读更多关于 Dealing with NAs when calculating mean (summarize_each) on group_by

I have a data frame md: md <- data.frame(x = c(3,5,4,5,3,5), y = c(5,5,5,4,4,1), z = c(1,3,4,3,5,5), device1 = c("c","a","a","b","c","c"), device2 = c("B","A","A","A","B","B")) md[2,3] <- NA md[4,1] <- NA md I want to calculate means by device1 / device2 combinations using dplyr: library(dplyr) md %>% group_by(device1, device2) %>% summarise_each(funs(mean)) However, I am getting some NAs. I want the NAs to be ignored (na.rm = TRUE) - I tried, but the function doesn't want to accept this argument. Both these lines result in error: md %>% group_by(device1, device2) %>% summarise_each(funs(mean)

scale_fill_manual define color for NA values

阅读更多关于 scale_fill_manual define color for NA values

I try to make a barplot with ggplot2 and am facing some issues with defining the color for NA. ggh <- ggplot(data=dat, aes(x=var1, fill=var2))+ geom_bar(position="dodge")+ scale_fill_manual( values=c("s"="steelblue", "i"="darkgoldenrod2", "r"="firebrick4", na.value="black")) In my var2 I have values c("s", "i", "r", NA) . For some reason my code above inside the scale_fill_manual does not work for NA, even if it works fine for all the others values. Can someone help me figure out why? Thanks for the help The na.value needs to be outside of the values argument. Here is an example: library