Read from File Variance Calculation

孤人 提交于 2020-01-07 04:55:11

问题


@Jerry Coffin

I get the logic, while(File>>value)//while input just taken from file is true .... do computation. Yet when I implemented this the counter only went to 1 & it's value was very high. Sometime is wrong, but I have no idea what. The file is valid

File.open(FileName, ifstream::in);  
while(File>>value){  
    ++counter;  
    sum += value;  
    sumsqr+= value * value;  
}  
average=sum/counter;  
variance = sumsqr/counter - average*average;  
File.close();  

here's the contents of the input file I am using "text.txt" 23244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 1486415241250586205864104818638684840823244564 14864152412505862058641048186386848408


回答1:


Sadly, (at least) three answers have quoted your while (!File.eof()) without commenting on the fact that this is just plain wrong. What you want is something like this:

while (File>>value) {
    ++counter;
    sum += value;
    sumsqr += value * value;
}
average = sum/counter;
variance = sumsqr/counter - average * average;

The bug from using while (!File.eof()) is insidious -- you'll typically get results that look reasonable, and are actually fairly close to correct. The problem is that eof() doesn't become true until after you've attempted to read from the file, and the attempted read has failed. When it fails, value will still have the last value you read, so it'll act like the last number in the list was really there twice (e.g., if your file contained 21 numbers, your loop would execute 22 times, and on the 22nd iteration, it would use the 21st number again). This will throw your calculations off a bit, but usually not enough that it's immediately obvious -- nearly the worst possible kind of bug.

Edit: Here's a complete test program:

#include <fstream>
#include <iostream>

double variance(std::istream &File) {
    double value, average, sum, counter, sumsqr, variance;
    while (File>>value) {
        ++counter;
        sum += value;
        sumsqr += value * value;
    }
    average = sum/counter;
    variance = sumsqr/counter - average * average;
    return variance;
}

double variance2(std::istream &File) {
    double value, average, sum, counter, sumsqr, variance;
    while (!File.eof()) {
        ++counter;
        File >> value;
        sum += value;
        sumsqr += value * value;
    }
    average = sum/counter;
    variance = sumsqr/counter - average * average;
    return variance;
}

int main() { 
    std::ifstream in("data.txt");
    double v1 = variance1(in);
    in.clear();
    in.seekg(0);
    double v2 = variance2(in);

    std::cout << "Using \"while (file>>value)\"" << v1 << "\n";
    std::cout << "Using \"while (!file.eof())\"" << v2 << "\n";
    return 0;
}

Here's some test data to go with:

1
2
3
4
5
6
7
8
9
10

When I run this on that data, I get:

Using "while (file>>value)": 8.25 
Using "while (!file.eof())": 9.17355

As a cross-check, I did the computation in Excel, using two sets of data:

1           1
2           2
3           3
4           4
5           5
6           6
7           7
8           8
9           9
10          10
8.25        10
            9.173553719

The last line in each column is the result of a formula doing "VARP" on the preceding data. Note that my function matches with what Excel produces for the correct input data. The function using while (!file.eof()) matches with what Excel produces with the last number duplicated.

I can't even begin to guess what's happening to make the loop run only once and read an incorrect value. Without being able to either guess at or reproduce the problem, I'm afraid I can't provide much in the way of useful suggestions about how to fix it.




回答2:


Your computation of variance is totally incorrect. In statistical terms, variance is

E(x^2) - [E(x)^2]

So get rid of that second loop (I'm not even sure what you think it does) and change the first loop to:

while(!File.eof()){
    counter++;
    value = File.get();
    sum += value;
    sumsqr += value*value;
}
average = sum/counter;
variance = (sumsqr/counter) - (average*average);

EDIT: Jerry Coffin's answer is even better as it demonstrates the issue with eof().




回答3:


you can write like that

variance=counter*(average*average)



回答4:


In your second !File.eof() loop, you are not reading from the file. Isn't the variance the sum of the squares of the differences between values and the average? Your loop doesn't look at the values from the file at all. Also, using integer variables for the sum, average, and variance is likely to lead to inaccuracy; you might want double for those instead.




回答5:


while(!File.eof()){
        variance +=(average*average);
    }

The above lines don't appear to make much sense. You are not reading anything in that while block. This while block isn't expected to terminate.




回答6:


Well, if the question doesn't limit what libraries you can use I would suggest using the Boost Accumulators which make this type of thing trivial.

You get variance, mean, and whatever other basic statistical value you desire. They have a few issues working with long double, but otherwise they are great!



来源:https://stackoverflow.com/questions/4790219/read-from-file-variance-calculation

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!