问题
I have a data.frame with character data, I want to end up with a matrix with the same column headings but with counts for each value. So far I can get an empty matrix of the dimensions I want, but when I try to populate myMatrix
with counts, it doesn't work.
myData <- data.frame(a=LETTERS[5:8], b=LETTERS[6:9], c=rep(LETTERS[5:6],2), d=rep(LETTERS[7],4))
# a b c d
# 1 E F E G
# 2 F G F G
# 3 G H E G
# 4 H I F G
myValues <- sort(unique(unlist(myData))) # E F G H I
myList <- lapply(myData, table)
myMatrix <- matrix(nrow=length(myValues), ncol=length(myList), dimnames=list(myValues,names(myList)))
# a b c d
# E NA NA NA NA
# F NA NA NA NA
# G NA NA NA NA
# H NA NA NA NA
# I NA NA NA NA
So far so good. This is the part that doesn't do what I expect:
lapply(seq_along(myList), function(i) {myMatrix[names(myList[[i]]),names(myList[i])] <- myList[[i]]})
It returns the right values, but myMatrix
is still full of NAs. Oddly, this one works:
myMatrix[names(myList[[2]]),names(myList[2])] <- myList[[2]]
# a b c d
# E NA NA NA NA
# F NA 1 NA NA
# G NA 1 NA NA
# H NA 1 NA NA
# I NA 1 NA NA
Why is the assignment to myMatrix
failing within lapply
and how can I get it to work (without a for
loop)?
回答1:
@orizon is correct about why your use of lapply
is not working as you expected. You would have to replace <-
with <<-
for it to work but it is in general considered bad practice for *apply
functions to have such side-effects.
Instead, you can use
sapply(lapply(myData, factor, unique(unlist(myData))), table)
# a b c d
# E 1 0 2 0
# F 1 1 2 0
# G 1 1 0 4
# H 1 1 0 0
# I 0 1 0 0
回答2:
Here is an approach that will return a data.frame
# create table, convert to data.frames then give appropriate column names
myList <- Map(setNames, lapply(lapply(myData, table), data.frame), Map(c, 'Var', names(myList)))
# merge recursively
Reduce(function(...) merge(..., by = 'Var', all = T), myList)
Var a b c d
1 E 1 NA 2 NA
2 F 1 1 2 NA
3 G 1 1 NA 4
4 H 1 1 NA NA
5 I NA 1 NA NA
回答3:
A single call to table
can get the desired result, once you collapse everything back to two vectors. 1 vector for the values in the data.frame
, 1 vector for the column identifier using col
:
table(unlist(myData), colnames(myData)[col(myData)])
Result:
a b c d
E 1 0 2 0
F 1 1 2 0
G 1 1 0 4
H 1 1 0 0
I 0 1 0 0
来源:https://stackoverflow.com/questions/13735525/matrix-assignment-failing-within-lapply