na | 易学教程

Counting non NAs in a data frame; getting answer as a vector

阅读更多关于 Counting non NAs in a data frame; getting answer as a vector

Say I have the following R data.frame ZZZ : ( ZZZ <- structure(list(n = c(1, 2, NA), m = c(6, NA, NA), o = c(7, 8, 8)), .Names = c("n", "m", "o"), row.names = c(NA, -3L), class = "data.frame") ) ## not run n m o 1 1 6 7 2 2 NA 8 3 NA NA 8 I want to know, in the form of a vector, how many non-NAs I've got. I want the answer available to me as: 2, 1, 3 When I use the command length(ZZZ) , I get 3 , which of course is the number of vectors in the data.frame , a valuable enough piece of information. I have other functions that operate on this data.frame and give me answers in the form of vectors,

Save pandas dataframe but conserving NA values

阅读更多关于 Save pandas dataframe but conserving NA values

问题 I have this code import pandas as pd import numpy as np import csv df = pd.DataFrame({'animal': 'cat dog cat fish dog cat cat'.split(), 'size': list('SSMMMLL'), 'weight': [8, 10, 11, 1, 20, 12, 12], 'adult' : [False] * 5 + [True] * 2}); And I changed the weight with NA values: df['weight'] = np.nan And finally I saved it df.to_csv("ejemplo.csv", sep=";", decimal=",", quoting=csv.QUOTE_NONNUMERIC, index=False) But when I read the file I have "", instead of NA I want to put NA instead of Nan I

R function prcomp fails with NA's values even though NA's are allowed

阅读更多关于 R function prcomp fails with NA's values even though NA's are allowed

问题 I am using the function prcomp to calculate the first two principal components. However, my data has some NA values and therefore the function throws an error. The na.action defined seems not to work even though it is mentioned in the help file ?prcomp Here is my example: d <- data.frame(V1 = sample(1:100, 10), V2 = sample(1:100, 10)) prcomp(d, center = TRUE, scale = TRUE, na.action = na.omit) d$V1[5] <- NA d$V2[7] <- NA prcomp(d, center = TRUE, scale = TRUE, na.action = na.omit) I am using

How to replace empty string with NA in R dataframe?

阅读更多关于 How to replace empty string with NA in R dataframe?

问题 My first approach was to use na.strings="" when I read the data in from a csv. This doesn't work for some reason. I also tried: df[df==''] <- NA Which gave me an error: Can't use matrix or array for column indexing. I tried just the column: df$col[df$col==''] <- NA This converts every value in the entire dataframe to NA, even though there are values besides empty strings. Then I tried to use mutate_all : replace.empty <- function(a) { a[a==""] <- NA } #dplyr pipe df %>% mutate_all(funs

With the R package xlsx, is it possible to set na.strings when reading an Excel file?

阅读更多关于 With the R package xlsx, is it possible to set na.strings when reading an Excel file?

问题 I'm reading in an Excel file using read.xlsx , and I would like to set na.strings as you can with read.table . Is this possible? It doesn't work to just add na.strings to the call like this: Data <- read.xlsx("my file.xlsx", sheetName = "MyData", na.strings = "no info") Is there some other way to do it? 回答1: No this is not possible for the simple reason that read.xlsx doesn't take care of special missing values. But this can be a possible enhancement for getCellvalue function. You can either

Aligning Data frame with missing values

阅读更多关于 Aligning Data frame with missing values

I'm using a data frame with many NA values. While I'm able to create a linear model, I am subsequently unable to line the fitted values of the model up with the original data due to the missing values and lack of indicator column. Here's a reproducible example: library(MASS) dat <- Aids2 # Add NA's dat[floor(runif(100, min = 1, max = nrow(dat))),3] <- NA # Create a model model <- lm(death ~ diag + age, data = dat) # Different Values length(fitted.values(model)) # 2745 nrow(dat) # 2843 There are actually three solutions here: pad NA to fitted values ourselves; use predict() to compute fitted

R count NA by group

阅读更多关于 R count NA by group

Could someone please explain why I get different answers using the aggregate function to count missing values by group? Also, is there a better way to count missing values by group using a native R function? DF <- data.frame(YEAR=c(2000,2000,2000,2001,2001,2001,2001,2002,2002,2002), X=c(1,NA,3,NA,NA,NA,7,8,9,10)) DF aggregate(X ~ YEAR, data=DF, function(x) { sum(is.na(x)) }) with(DF, aggregate(X, list(YEAR), function(x) { sum(is.na(x)) })) aggregate(X ~ YEAR, data=DF, function(x) { sum(! is.na(x)) }) with(DF, aggregate(X, list(YEAR), function(x) { sum(! is.na(x)) })) The help page at

Handle Continous Missing values in time-series data

阅读更多关于 Handle Continous Missing values in time-series data

I have a time-series data as shown below. 2015-04-26 23:00:00 5704.27388916015661380 2015-04-27 00:00:00 4470.30868326822928793 2015-04-27 01:00:00 4552.57241617838553793 2015-04-27 02:00:00 4570.22250032825650123 2015-04-27 03:00:00 NA 2015-04-27 04:00:00 NA 2015-04-27 05:00:00 NA 2015-04-27 06:00:00 12697.37724086216439900 2015-04-27 07:00:00 5538.71119009653739340 2015-04-27 08:00:00 81.95060647328695325 2015-04-27 09:00:00 8550.65816895300667966 2015-04-27 10:00:00 2925.76573206583680076 How should I handle Continous NA values. In cases where I have only one NA, I use to take the average

Find columns with all missing values

阅读更多关于 Find columns with all missing values

问题 I am writing a function, which needs a check on whether (and which!) column (variable) has all missing values ( NA , <NA> ). The following is fragment of the function: test1 <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3)) test2 <- data.frame (matrix(c(1,2,3,NA,NA,NA,NA,NA,2), 3,3)) na.test <- function (data) { if (colSums(!is.na(data) == 0)){ stop ("The some variable in the dataset has all missing value, remove the column to proceed") } } na.test (test1) Warning message: In if (colSums(

Remove columns from dataframe where some of values are NA

阅读更多关于 Remove columns from dataframe where some of values are NA

I have a dataframe where some of the values are NA. I would like to remove these columns. My data.frame looks like this v1 v2 1 1 NA 2 1 1 3 2 2 4 1 1 5 2 2 6 1 NA I tried to estimate the col mean and select the column means !=NA. I tried this statement, it does not work. data=subset(Itun, select=c(is.na(colMeans(Itun)))) I got an error, error : 'x' must be an array of at least two dimensions Can anyone give me some help? The data: Itun <- data.frame(v1 = c(1,1,2,1,2,1), v2 = c(NA, 1, 2, 1, 2, NA)) This will remove all columns containing at least one NA : Itun[ , colSums(is.na(Itun)) == 0] An