na | 易学教程

Setting <NA> to blank

阅读更多关于 Setting to blank

I have a dataframe with an NA row: df = data.frame(c("classA", NA, "classB"), t(data.frame(rep("A", 5), rep(NA, 5), rep("B", 5)))) rownames(df) <- c(1,2,3) colnames(df) <- c("class", paste("Year", 1:5, sep = "")) > df class Year1 Year2 Year3 Year4 Year5 1 classA A A A A A 2 <NA> <NA> <NA> <NA> <NA> <NA> 3 classB B B B B B I introduced the empty row (NA row) on purpose because I wanted to have some space between classA row and classB row. Now, I would like to substitute the <NA> by blank, so that the second row looks like an empty row. I tried: df[is.na(df)] <- "" and df[df == "NA"] <- "" but

R: need finite 'ylim' values in function

阅读更多关于 R: need finite 'ylim' values in function

问题 I'd like to plot the data in data.frame xy for each group (defined by ID ). When a year before 1946 is in a group, plot 2 should be executed. When the years are between 1946 and 2014, plot1 should be executed. My problem: This works fine without NA values, but as I have data gaps I rely on NAs to define these data gaps. This is why I get an error: error in plot.window(need finite 'ylim' values) . I tried to put finite=T in plot1 at the y-axis but this gives a subscript out of bounds error. Is

How to remove rows with NAs only if they are present in more than certain percentage of columns?

阅读更多关于 How to remove rows with NAs only if they are present in more than certain percentage of columns?

问题 I want to use na.omit (data) for the following example dataset, but on a condition so as to remove rows with NAs only when they are present in lets say "more than 30%" of the columns. data: C1 C2 C3 C4 C5 Gene1 0.07 NA 0.05 0.07 0.07 Gene2 0.2 0.18 0.16 0.15 0.15 Gene3 NA 0.93 0.9 NA 0.92 Gene4 0.32 0.05 0.12 0.13 0.05 Gene5 0.44 0.53 0.46 0.03 0.47 Gene6 NA 0.34 NA 0.8 NA Gene7 0.49 0.55 0.67 0.49 0.89 Gene8 0.25 NA 0.49 NA NA Gene9 0.1 0.1 0.05 NA 0.09 So the resulting file should be as

marginal effects of mlogit in R

阅读更多关于 marginal effects of mlogit in R

问题 I am new to R, and I don't understand yet completely the logic of its calculations... I cannot overcome my problem with the help from previous posts either. I have a data set of about 600 observations for 11 variables. I have successfully run the multinomial model on it, however I cannot achieve the marginal effects because my mean() command is getting NAs: The data set: > head(data,n=50) ID time CHINN DEBT ERA INFL MONEY OPENNESS RESERVES RGDP RSVS 1 POL 1993 -1.8639720 NA 0 32.8815343 33

How does R represent NA internally?

阅读更多关于 How does R represent NA internally?

问题 R seems to support an efficient NA value in floating point arrays. How does it represent it internally? My (perhaps flawed) understanding is that modern CPUs can carry out floating point calculations in hardware, including efficient handling of Inf, -Inf and NaN values. How does NA fit into this, and how is it implemented without compromising performance? 回答1: R uses NaN values as defined for IEEE floats to represent NA_real_ , Inf and NA . We can use a simple C++ function to make this

Add a box for the NA values to the ggplot legend for a continous map

阅读更多关于 Add a box for the NA values to the ggplot legend for a continous map

I have got a map with a legend gradient and I would like to add a box for the NA values. My question is really similar to this one and this one . Also I have read this topic , but I can't find a "nice" solution somewhere or maybe there isn't any? Here is an reproducible example: library(ggplot2) map <- map_data("world") map$value <- setNames(sample(-50:50, length(unique(map$region)), TRUE), unique(map$region))[map$region] map[map$region == "Russia", "value"] <- NA ggplot() + geom_polygon(data = map, aes(long, lat, group = group, fill = value)) + scale_fill_gradient2(low = "brown3", mid =

NaN is removed when using na.rm=TRUE

阅读更多关于 NaN is removed when using na.rm=TRUE

问题 This reproducible example is a very simplified version of my code: x <- c(NaN, 2, 3) #This is fine, as expected max(x) > NaN #Why does na.rm remove NaN? max(x, na.rm=TRUE) > 3 To me, NA (missing value) and NaN (not a number) are two completely different entities, why does na.rm remove NaN ? How can I ignore NA and not NaN ? ps:I am using 64-bit R version 3.0.0 on Windows7. Edit: Upon some more study I found that is.na returns true for NaN too! This is the cause of confusion for me. is.na(NaN)

Combining more than 2 columns by removing NA's in R

阅读更多关于 Combining more than 2 columns by removing NA's in R

At first sight this seems a duplicate of Combine/merge columns while avoiding NA? but in fact it isn't. I am dealing sometimes with more than two columns instead of just two. My dataframe looks like this: col1 col2 col3 col4 col5 [1,] 1 NA NA 13 NA [2,] NA NA 10 NA 18 [3,] NA 7 NA 15 NA [4,] 4 NA NA 16 NA Now I want to "collapse" this dataframe into a dataframe with less columns and with removed NA's. In fact I am looking for and "excel way of doing": removing one cell and the whole row will move one cell to the left. The result in this example case would be: col1 col2 [1,] 1 13 [2,] 10 18 [3,

Why does dplyr's filter drop NA values from a factor variable?

阅读更多关于 Why does dplyr's filter drop NA values from a factor variable?

When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example: library(dplyr) set.seed(919) (dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T)))) # var1 # 1 <NA> # 2 3 # 3 3 # 4 1 # 5 1 # 6 <NA> # 7 2 # 8 2 # 9 <NA> # 10 1 filter(dat, var1 != 1) # var1 # 1 3 # 2 3 # 3 2 # 4 2 This does not seem ideal -- I only wanted to drop rows where var1 == 1 . It looks like this is occurring because any comparison with NA returns NA , which filter then drops. So, for example, filter(dat, !(var1 %in% 1)) produces

Getting last non na value across rows in a pandas dataframe

阅读更多关于 Getting last non na value across rows in a pandas dataframe

问题 I have a dataframe of shape (40,500). Each row in the dataframe has some numerical values till some variable column number k, and all the entries after that are nan. I am trying to get the value of last non-nan column in each row. Is there a way to do this without looping through all the rows of the dataframe? Sample Dataframe: 2016-06-02 7.080 7.079 7.079 7.079 7.079 7.079 nan nan nan 2016-06-08 7.053 7.053 7.053 7.053 7.053 7.054 nan nan nan 2016-06-09 7.061 7.061 7.060 7.060 7.060 7.060