na | 易学教程

Converting Character to Numeric without NA Coercion in R

阅读更多关于 Converting Character to Numeric without NA Coercion in R

I'm working in R and have a dataframe, dd_2006, with numeric vectors. When I first imported the data, I needed to remove $'s, decimal points, and some blank spaces from 3 of my variables: SumOfCost, SumOfCases, and SumOfUnits. To do that, I used str_replace_all . However, once I used str_replace_all , the vectors were converted to characters. So I used as.numeric(var) to convert the vectors to numeric, but NAs were introduced, even though when I ran the code below BEFORE I ran the as.numeric code, there were no NAs in the vectors. sum(is.na(dd_2006$SumOfCost)) [1] 0 sum(is.na(dd_2006

visual structure of a data.frame: locations of NAs and much more

阅读更多关于 visual structure of a data.frame: locations of NAs and much more

问题 I want to represent the structure of a data frame (or matrix, or data.table whatever) on a single plot with color-coding. I guess that could be very useful for many people handling various types of data, to visualize it in a single glance. Perhaps someone have already developed a package to do it, but I couldn't find one (just this). So here is a rough mockup of my "vision", kind of a heatmap, showing in color codes: the NA locations, the class of variables (factors (how many levels?),

Complete.obs of cor() function

阅读更多关于 Complete.obs of cor() function

I am establishing a correlation matrix for my data, which looks like this df <- structure(list(V1 = c(56, 123, 546, 26, 62, 6, NA, NA, NA, 15 ), V2 = c(21, 231, 5, 5, 32, NA, 1, 231, 5, 200), V3 = c(NA, NA, 24, 51, 53, 231, NA, 153, 6, 700), V4 = c(2, 10, NA, 20, 56, 1, 1, 53, 40, 5000)), .Names = c("V1", "V2", "V3", "V4"), row.names = c(NA, 10L), class = "data.frame") This gives the following data frame: V1 V2 V3 V4 1 56 21 NA 2 2 123 231 NA 10 3 546 5 24 NA 4 26 5 51 20 5 62 32 53 56 6 6 NA 231 1 7 NA 1 NA 1 8 NA 231 153 53 9 NA 5 6 40 10 15 200 700 5000 I normally use a complete.obs command

pheatmap: Color for NA

阅读更多关于 pheatmap: Color for NA

Using R package pheatmap to draw heatmaps. Is there a way to assign a color to NAs in the input matrix? It seems NA gets colored in white by default. E.g.: library(pheatmap) m<- matrix(c(1:100), nrow= 10) m[1,1]<- NA m[10,10]<- NA pheatmap(m, cluster_rows=FALSE, cluster_cols=FALSE) Thanks nico It is possible, but requires some hacking. First of all let's see how pheatmap draws a heatmap. You can check that just by typing pheatmap in the console and scrolling through the output, or alternatively using edit(pheatmap) . You will find that colours are mapped using mat = scale_colours(mat, col =

R gbm handling of missing values

阅读更多关于 R gbm handling of missing values

问题 Does anyone know how gbm in R handles missing values? I can't seem to find any explanation using google. 回答1: To explain what gbm does with missing predictors, let's first visualize a single tree of a gbm object. Suppose you have a gbm object mygbm . Using pretty.gbm.tree(mygbm, i.tree=1) you can visualize the first tree on mygbm, e.g.: SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight Prediction 0 46 1.629728e+01 1 5 9 26.462908 1585 -4.396393e-06 1 45 1.850000e+01

How to deal with NA in a panel data regression?

阅读更多关于 How to deal with NA in a panel data regression?

问题 I am trying to predict fitted values over data containing NA s, and based on a model generated by plm . Here's some sample code: require(plm) test.data <- data.frame(id=c(1,1,2,2,3), time=c(1,2,1,2,1), y=c(1,3,5,10,8), x=c(1, NA, 3,4,5)) model <- plm(y ~ x, data=test.data, index=c("id", "time"), model="pooling", na.action=na.exclude) yhat <- predict(model, test.data, na.action=na.pass) test.data$yhat <- yhat When I run the last line I get an error stating that the replacement has 4 rows while

Insert NA values into dataframe blank cells when importing read.csv/read.xlsx

阅读更多关于 Insert NA values into dataframe blank cells when importing read.csv/read.xlsx

The attached screenshot shows part of a dataframe which I have just imported into R from an excel file. In the cells which are blank, I need to insert 'NA'. How can I insert NA into any cell which is blank (whilst leaving the already populated cells alone)? The better question is how can I read it into R so the missing cells will already be NA s. Maybe you used something like this: read.csv(file, header=FALSE, strip.white = TRUE, sep=",") Specify the NA strings like this when you read it in: read.csv(file, header=FALSE, strip.white = TRUE, sep=",", na.strings= c("999", "NA", " ", "")) to

Efficient method to subset drop rows with NA values in R

阅读更多关于 Efficient method to subset drop rows with NA values in R

问题 Background Before running a stepwise model selection, I need to remove missing values for any of my model terms. With quite a few terms in my model, there are therefore quite a few vectors that I need to look in for NA values (and drop any rows that have NA values in any of those vectors). However, there are also vectors that contain NA values that I do not want to use as terms / criteria for dropping rows. Question How do I drop rows from a dataframe which contain NA values for any of a list

data.table do not compute NA groups in by

阅读更多关于 data.table do not compute NA groups in by

This question has a partial answer here but the question is too specific and I'm not able to apply it to my own problem. I would like to skip a potentially heavy computation of the NA group when using by . library(data.table) DT = data.table(X = sample(10), Y = sample(10), g1 = sample(letters[1:2], 10, TRUE), g2 = sample(letters[1:2], 10, TRUE)) set(DT, 1L, 3L, NA) set(DT, 1L, 4L, NA) set(DT, 6L, 3L, NA) set(DT, 6L, 4L, NA) DT[, mean(X*Y), by = .(g1,g2)] Here we can see there are up to 5 groups including the (NA, NA) group. Considering that (i) the group is useless (ii) the groups can be very

na.locf fill NAs up to maxgap even if gap > maxgap, with groups

阅读更多关于 na.locf fill NAs up to maxgap even if gap > maxgap, with groups

I've seen a solution to this, but can't get it to work for groups ( Fill NA in a time series only to a limited number ), and thought there must be a neater way to do this also? Say I have the following dt: dt <- data.table(ID = c(rep("A", 10), rep("B", 10)), Price = c(seq(1, 10, 1), seq(11, 20, 1))) dt[c(1:2, 5:10), 2] <- NA dt[c(11:13, 15:19) ,2] <- NA dt ID Price 1: A NA 2: A NA 3: A 3 4: A 4 5: A NA 6: A NA 7: A NA 8: A NA 9: A NA 10: A NA 11: B NA 12: B NA 13: B NA 14: B 14 15: B NA 16: B NA 17: B NA 18: B NA 19: B NA 20: B 20 What I would like to do, is to fill NA s both forward and back