MySQL Math - Is it possible to calculate a correlation in a query?

后端 未结 2 1945
日久生厌
日久生厌 2021-02-06 06:06

In a MySQL (5.1) database table there is data that represents:

  • how long a user takes to perform a task and
  • how many items the user handled during the tas
2条回答
  •  离开以前
    2021-02-06 06:26

    Single-Pass Solution

    There are two flavors of the Pearson correlation coefficient, one for a Sample and one for an entire Population. These are single-pass and, I believe, correct formulas for both:

    -- Methods for calculating the two Pearson correlation coefficients
    SELECT  
            -- For Population
            (avg(x * y) - avg(x) * avg(y)) / 
            (sqrt(avg(x * x) - avg(x) * avg(x)) * sqrt(avg(y * y) - avg(y) * avg(y))) 
            AS correlation_coefficient_population,
            -- For Sample
            (count(*) * sum(x * y) - sum(x) * sum(y)) / 
            (sqrt(count(*) * sum(x * x) - sum(x) * sum(x)) * sqrt(count(*) * sum(y * y) - sum(y) * sum(y))) 
            AS correlation_coefficient_sample
        FROM your_table;
    

    I developed and tested this as T-SQL. The code that generated the test data didn't translate to MySQL but the formulas should. Make sure your x and y are decimals values; integer math can significantly impact these calcs.

提交回复
热议问题