na

remove columns with NAs from all dataframes in list

这一生的挚爱 提交于 2019-12-19 06:52:27
问题 I have a list made up of several data frames. I would like to remove all of the columns with NAs in each data frame. Note the columns to be removed are not the same in each data frame. Sample data provided below. Any suggestions much appreciated. WW1_Data <- structure(list(Alnön = structure(list(Site_Name = structure(1L, .Label = c("Alnön","Ammarnäs", "Anjan", "Bäcksand", "Fittjebodarna", "Flatruet", "Glen", "Idre", "Klångstavallen", "Kramfors", "Ljungdalen", "Ljungris", "Mårdsund", "Mörtsjön

remove columns with NAs from all dataframes in list

ぐ巨炮叔叔 提交于 2019-12-19 06:52:02
问题 I have a list made up of several data frames. I would like to remove all of the columns with NAs in each data frame. Note the columns to be removed are not the same in each data frame. Sample data provided below. Any suggestions much appreciated. WW1_Data <- structure(list(Alnön = structure(list(Site_Name = structure(1L, .Label = c("Alnön","Ammarnäs", "Anjan", "Bäcksand", "Fittjebodarna", "Flatruet", "Glen", "Idre", "Klångstavallen", "Kramfors", "Ljungdalen", "Ljungris", "Mårdsund", "Mörtsjön

Replacing Missing Value in R

冷暖自知 提交于 2019-12-19 05:09:04
问题 I have to replace the missing value to maximum (Value) by ID. How to do in R ID Value 1 NA 5 15 8 16 6 8 7 65 8 NA 5 25 1 62 6 14 7 NA 9 11 8 12 9 36 1 26 4 13 回答1: I would first precompute the max values using a call to aggregate() , and also precompute which rows of the data.frame have an NA value. Then you can match the IDs into the aggregation table to extract the corresponding max value. maxes <- aggregate(Value~ID,df,max,na.rm=T); nas <- which(is.na(df$Value)); df$Value[nas] <- maxes

specifying “skip NA” when calculating mean of the column in a data frame created by Pandas

大憨熊 提交于 2019-12-19 01:15:06
问题 I am learning Pandas package by replicating the outing from some of the R vignettes. Now I am using the dplyr package from R as an example: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html R script planes <- group_by(hflights_df, TailNum) delay <- summarise(planes, count = n(), dist = mean(Distance, na.rm = TRUE)) delay <- filter(delay, count > 20, dist < 2000) Python script planes = hflights.groupby('TailNum') planes['Distance'].agg({'count' : 'count', 'dist' : 'mean'})

specifying “skip NA” when calculating mean of the column in a data frame created by Pandas

不羁的心 提交于 2019-12-19 01:15:06
问题 I am learning Pandas package by replicating the outing from some of the R vignettes. Now I am using the dplyr package from R as an example: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html R script planes <- group_by(hflights_df, TailNum) delay <- summarise(planes, count = n(), dist = mean(Distance, na.rm = TRUE)) delay <- filter(delay, count > 20, dist < 2000) Python script planes = hflights.groupby('TailNum') planes['Distance'].agg({'count' : 'count', 'dist' : 'mean'})

scale_fill_manual define color for NA values

雨燕双飞 提交于 2019-12-19 00:17:50
问题 I try to make a barplot with ggplot2 and am facing some issues with defining the color for NA. ggh <- ggplot(data=dat, aes(x=var1, fill=var2))+ geom_bar(position="dodge")+ scale_fill_manual( values=c("s"="steelblue", "i"="darkgoldenrod2", "r"="firebrick4", na.value="black")) In my var2 I have values c("s", "i", "r", NA) . For some reason my code above inside the scale_fill_manual does not work for NA, even if it works fine for all the others values. Can someone help me figure out why? Thanks

How to remove NA values in vector in R [duplicate]

跟風遠走 提交于 2019-12-18 19:00:29
问题 This question already has answers here : Remove NA values from a vector (6 answers) Closed 5 years ago . I have a vector which stores over 1000 values. The first 50 values are NAs, how can I get rid of it? c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.5741, 1.583, 1.605, 1.633, 1.6465, 1.6475, 1.6329, 1.6413, 1.685, 1.692, 1.7087, 1.7055

Getting boolean pandas column that supports NA/ is nullable

被刻印的时光 ゝ 提交于 2019-12-18 16:45:32
问题 How can I create a pandas dataframe column with dtype bool (or int for that matter) with support for Nan/missing values? When I try like this: d = {'one' : np.ma.MaskedArray([True, False, True, True], mask = [0,0,1,0]), 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d) print (df.dtypes) print (df) column one is implicitly converted to object. Likewise similar for ints : d = {'one' : np.ma.MaskedArray([1,3,2,1], mask = [0,0,1,0]), 'two' : pd.Series([1., 2.,

dplyr join define NA values

烈酒焚心 提交于 2019-12-18 12:54:34
问题 Can I define a "fill" value for NA in dplyr join? For example in the join define that all NA values should be 1? require(dplyr) lookup <- data.frame(cbind(c("USD","MYR"),c(0.9,1.1))) names(lookup) <- c("rate","value") fx <- data.frame(c("USD","MYR","USD","MYR","XXX","YYY")) names(fx)[1] <- "rate" left_join(x=fx,y=lookup,by=c("rate")) Above code will create NA for values "XXX" and "YYY". In my case I am joining a large number of columns and there will be a lot of non-matches. All non-matches

What's the best way to replace missing values with NA when reading in a .csv?

会有一股神秘感。 提交于 2019-12-18 11:01:06
问题 I have a .csv dataset with many missing values, and I'd like R to recognize them all the same way (the "correct" way) when I read the table in. I've been using: import = read.csv("/Users/dataset.csv", header =T, na.strings=c("")) This script fills all the empty cells with something, but it's not consistant. When I look at the data with head(import) , some missing cells are filled with <NA> and some missing cells are filled with NA . I fear that R treats these two ways of identifying missing