问题
I have a dataframe with many factors and want to create statistical tables that show the distribution for each factor, including factor levels with zero observations. For instance, these data:
structure(list(engag11 = structure(c(5L, 4L, 4L), .Label = c("Strongly Disagree", "Disagree", "Neither A or D", "Agree", "Strongly Agree"), class = "factor"), encor11 = structure(c(1L, 1L, 1L), .Label = c("Agree", "Neither Agree or Disagree", "Strongly Agree"), class = "factor"), know11 = structure(c(3L,
1L, 1L), .Label = c("Agree", "Neither Agree or Disagree", "Strongly Agree"), class = "factor")), .Names = c("engag11", "encor11", "know11"), row.names = c(NA, 3L), class = "data.frame")
show 6 rows, but only some of the factor levels are observed for each column. When I produce a table, I'd like to display not only counts for the levels observed, but also levels NOT observed (such as "Strongly Disagree"). Like this:
# define the factor and levels
library(dplyr);library(pander);library(forcats)
eLevels<-factor(c(1,2,3,4,5), levels=1:5, labels=c("Strongly Disagree","Disagree","Neither A or D","Agree","Strongly Agree"),ordered =TRUE )
# apply the factor to one variable
csc2$engag11<-factor(csc2$engag11,eLevels)
t1<-table(csc2$engag11)
pander(t1)
Which results in a frequency table that shows counts for each level, including zeroes for levels not reported / observed.
But I have dozens of variables to convert. A simple lapply
function recommended on Stackoverflow doesn't seem to work, such as this one:
csc2[1:3]<-lapply(csc[1:3],eLevels)
I also tried a simple function (n=list of columns) for this, but failed:
facConv<-function(df,n)
{ df$n<-factor(c(1,2,3,4,5), levels=1:5, labels=c("Strongly
Disagree","Disagree","Neither A or D","Agree","Strongly Agree") )
return(result) }
Can someone offer a solution?
回答1:
An lapply
should work fine, you just need to specify the factor()
function:
csc2[1:3] <- lapply(csc2[1:3], function(x) factor(x, eLevels))
Then you can call table like:
table(csc2[1])
#Strongly Disagree Disagree Neither A or D Agree Strongly Agree
# 0 0 0 2 1
table(csc2[2])
#Strongly Disagree Disagree Neither A or D Agree Strongly Agree
# 0 0 0 3 0
回答2:
The inelegant quick and dirty way is to use for
loop:
df <- data.frame(A = c("A", "A", "B"),
B = c("A", "C", "A"),
C = c("A", "A", "D"))
lvl <- c("A", "B", "C", "D", "E")
for (i in 1:ncol(df)) {
df[,i] <- factor(df[,i], levels=lvl)
}
table(df$A)
And if your original data is numbers then:
df <- data.frame(A = c(1,1,2),
B = c(1,3,1),
C = c(1,1,4))
lvl <- c("A", "B", "C", "D", "E")
for (i in 1:ncol(df)) {
df[,i] <- factor(df[,i], levels=1:5, labels=lvl)
}
df
table(df$A)
来源:https://stackoverflow.com/questions/48196217/apply-factor-levels-to-multiple-columns-with-missing-factor-levels