How to fit the 2D scatter data with a line with C++

前端 未结 7 1594
春和景丽
春和景丽 2020-12-16 23:19

I used to work with MATLAB, and for the question I raised I can use p = polyfit(x,y,1) to estimate the best fit line for the scatter data in a plate. I was wondering which r

7条回答
  •  死守一世寂寞
    2020-12-17 00:13

    This page describes the algorithm easier than Wikipedia, without extra steps to calculate the means etc. : http://faculty.cs.niu.edu/~hutchins/csci230/best-fit.htm . Almost quoted from there, in C++ it's:

    #include 
    #include 
    
    struct Point {
      double _x, _y;
    };
    struct Line {
      double _slope, _yInt;
      double getYforX(double x) {
        return _slope*x + _yInt;
      }
      // Construct line from points
      bool fitPoints(const std::vector &pts) {
        int nPoints = pts.size();
        if( nPoints < 2 ) {
          // Fail: infinitely many lines passing through this single point
          return false;
        }
        double sumX=0, sumY=0, sumXY=0, sumX2=0;
        for(int i=0; i

    Please, be aware that both this algorithm and the algorithm from Wikipedia ( http://en.wikipedia.org/wiki/Simple_linear_regression#Fitting_the_regression_line ) fail in case the "best" description of points is a vertical line. They fail because they use

    y = k*x + b 
    

    line equation which intrinsically is not capable to describe vertical lines. If you want to cover also the cases when data points are "best" described by vertical lines, you need a line fitting algorithm which uses

    A*x + B*y + C = 0
    

    line equation. You can still modify the current algorithm to produce that equation:

    y = k*x + b <=>
    y - k*x - b = 0 <=>
    B=1, A=-k, C=-b
    

    In terms of the above code:

    B=1, A=-_slope, C=-_yInt
    

    And in "then" block of the if checking for denominator equal to 0, instead of // Fail: it seems a vertical line, produce the following line equation:

    x = xMean <=>
    x - xMean = 0 <=>
    A=1, B=0, C=-xMean
    

    I've just noticed that the original article I was referring to has been deleted. And this web page proposes a little different formula for line fitting: http://hotmath.com/hotmath_help/topics/line-of-best-fit.html

    double denominator = sumX2 - 2 * sumX * xMean + nPoints * xMean * xMean;
    ...
    _slope = (sumXY - sumY*xMean - sumX * yMean + nPoints * xMean * yMean) / denominator;
    

    The formulas are identical because nPoints*xMean == sumX and nPoints*xMean*yMean == sumX * yMean == sumY * xMean.

提交回复
热议问题