How to avoid NA columns in dcast() output?

折月煮酒 提交于 2019-12-04 04:27:27

问题


How can I avoid NA columns in dcast() output from the reshape2 package?

In this dummy example the dcast() output will include an NA column:

require(reshape2)
data(iris)
iris[ , "Species2"] <- iris[ , "Species"]
iris[ 2:7, "Species2"] <- NA
(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
            fun.aggregate = length))
##     Species setosa versicolor virginica NA
##1     setosa     44          0         0  6
##2 versicolor      0         50         0  0
##3  virginica      0          0        50  0

For a somewhat similar usecase, table() does have an option that allows to avoid this:

table(iris[ , c(5,6)], useNA = "ifany")  ##same output as from dcast()
##            Species2
##Species      setosa versicolor virginica <NA>
##  setosa         44          0         0    6
##  versicolor      0         50         0    0
##  virginica       0          0        50    0
table(iris[ , c(5,6)], useNA = "no")  ##avoid NA columns
##            Species2
##Species      setosa versicolor virginica
##  setosa         44          0         0
##  versicolor      0         50         0
##  virginica       0          0        50

Does dcast() have a similar option that removes NA columns in the output? How can I avoid getting NA columns? (This function has a number of rather cryptic options that are sternly documented and that I cannot quite grasp...)


回答1:


Here is how I was able to get around it:

iris[is.na(iris)] <- 'None'

x <- dcast(iris, Species ~ Species2, value.var="Sepal.Width", fun.aggregate = length)

x$None <- NULL

The idea is that you replace all the NAs with 'None', so that dcast creates a column called 'None' rather than 'NA'. Then, you can just delete that column in the next step if you don't need it.




回答2:


One solution that I've found, which I'm not positively unhappy with, is based on the dropping NA values approach suggested in the comments. It leverages the subset argument in dcast() along with .() from plyr:

require(plyr)
(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width",
            fun.aggregate = length, subset = .(!is.na(Species2))))
##     Species setosa versicolor virginica
##1     setosa     44          0         0
##2 versicolor      0         50         0
##3  virginica      0          0        50

For my particular purpose (within a custom function) the following works better:

(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
            fun.aggregate = length, subset = .(!is.na(get("Species2")))))
##     Species setosa versicolor virginica
##1     setosa     44          0         0
##2 versicolor      0         50         0
##3  virginica      0          0        50



回答3:


You could rename the NA column of the output and then make it NULL. (This works for me).

require(reshape2)
data(iris)
iris[ , "Species2"] <- iris[ , "Species"]
iris[ 2:7, "Species2"] <- NA

(x <- dcast(iris, Species ~ Species2, value.var = "Sepal.Width", 
            fun.aggregate = length)) 

setnames(x , c("setosa", "versicolor", "virginica", "newname"))

x$newname <- NULL



回答4:


library(dplyr)
library(tidyr)
iris %>%
  filter(!is.na(Species2)) %>%
  group_by(Species, Species2) %>%
  summarize(freq = n()) %>%
  spread(Species2, freq)


来源:https://stackoverflow.com/questions/32931473/how-to-avoid-na-columns-in-dcast-output

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!