问题
I am very new to R tool and my questions might be a little too obvious.
I have a file that has the following data:
Score Frequency
100 10
200 30
300 40
How do I read this file and compute the mean, median, variance and standard deviation?
If this above table was just raw scores without any frequency information, then I can do this:
x <- scan(file="scores.txt", what = integer())
median(x)
and so on, but I am not able to understand how to do these computations when given a frequency table.
回答1:
Read the data with read.table (read ?read.table for reading from a file). Then, expand the data by creating a vector of individual scores. We can then write a function to get the desired statistics. You can, of course, calculate each individually if you don't wish to write a function.
d <- read.table(header = TRUE, text = "Score Frequency
100 10
200 30
300 40")
d2 <- rep(d$Score, d$Frequency) ## expands the data by frequency of score
multi.fun <- function(x) {
c(mean = mean(x), median = median(x), var = var(x), sd = sd(x))
}
multi.fun(d2)
# mean median var sd
# 237.50000 250.00000 4905.06329 70.03616
回答2:
Depending on what format you input file is in you can use read.csv("scores.txt"). You can change the separator with read.csv("scores.txt", sep="\t"). If you data doesn't have a header, you can use the option header=F.
I am going to use a , since it is easier to read here.
INPUT FILE
Score,Frequency
100,10
200,30
300,40
R Source Code
x <- read.csv("scores.txt")
mean(x$Score)
median(x$Score)
var(x$Score)
mean(x$Score)
sd(x$Score)
R Output
> mean(x$Score)
[1] 200
> median(x$Score)
[1] 200
> var(x$Score)
[1] 10000
> mean(x$Score)
[1] 200
> sd(x$Score)
[1] 100
If you want to include the frequency.
R Source Code
x <- read.csv("scores.txt")
mean(rep(x$Score, x$Frequency))
median(rep(x$Score, x$Frequency))
var(rep(x$Score, x$Frequency))
mean(rep(x$Score, x$Frequency))
sd(rep(x$Score, x$Frequency))
R Output
> mean(rep(x$Score, x$Frequency))
[1] 237.5
> x <- read.csv("scores.txt")
> mean(rep(x$Score, x$Frequency))
[1] 237.5
> median(rep(x$Score, x$Frequency))
[1] 250
> var(rep(x$Score, x$Frequency))
[1] 4905.063
> mean(rep(x$Score, x$Frequency))
[1] 237.5
> sd(rep(x$Score, x$Frequency))
[1] 70.03616
回答3:
lines <- readLines("scores.txt")[-1]
mat <- matrix(as.numeric(unlist(
strsplit(gsub(".*(\\d+).*(\\d+).*", "\\1,\\2", lines), ","))),
ncol = 2, byrow = TRUE)
print(summary(mat[, 1]))
print(summary(mat[, 2]))
来源:https://stackoverflow.com/questions/22644481/r-computing-mean-median-variance-from-file-with-frequency-distribution