I have some data where I want to extract the frequency with which the integers appear. Here is some sample data:
df <- read.table(header=T, text="A B C D
1 1 5 3 1
2 1 2 3 2
3 2 3 5 3
4 1 4 5 3
5 3 1 4 2
6 5 2 5 1
")
df
I can loop through these and get the counts as follows:
for (i in 1:5){
print(colSums(df==i))
}
But every time I try to store the output I get an error. What is the neatest way to store the resultant output in a dataframe? I think I'm getting confused about the way to store data that's run through a loop. Thanks for your help.
We can use mtabulate
library(qdapTools)
t(mtabulate(df))
# A B C D
#1 3 1 0 2
#2 1 2 0 2
#3 1 1 2 2
#4 0 1 1 0
#5 1 1 3 0
In base R, we can also unlist the dataset, replicate the column names, and use table (not using any loop, explicit (for) or implicit (lapply).
table(unlist(df),names(df)[col(df)])
# A B C D
# 1 3 1 0 2
# 2 1 2 0 2
# 3 1 1 2 2
# 4 0 1 1 0
# 5 1 1 3 0
Or as @nicola mentioned, the instead of col(df), we can use rep (should be faster)
table(unlist(df), rep(names(df),each=nrow(df)))
We could also do this in base-R without a for-loop:
do.call(cbind, lapply(df, function(x){table(factor(x,levels=1:6))}))
A B C D
1 3 1 0 2
2 1 2 0 2
3 1 1 2 2
4 0 1 1 0
5 1 1 3 0
6 0 0 0 0
Here's another option:
library(reshape2)
table(melt(df))
#No id variables; using all as measure variables
# value
#variable 1 2 3 4 5
# A 3 1 1 0 1
# B 1 2 1 1 1
# C 0 0 2 1 3
# D 2 2 2 0 0
Unlike @akrun, I prefer to use base R when possible.
out <- matrix(0, nrow= 6, ncol=4, dimnames= list(1:6, LETTERS[1:4]))
for (i in 1:6) {
out[i,] <- unlist(lapply(df, function(j) sum(j == i)))
}
R> out
A B C D
1 3 1 0 2
2 1 2 0 2
3 1 1 2 2
4 0 1 1 0
5 1 1 3 0
6 0 0 0 0
来源:https://stackoverflow.com/questions/35582835/extracting-a-series-of-integers-using-a-loop