问题
Problem: the cov=True
option of np.polyfit()
produces a diagonal with non-sensical negative values.
UPDATE: after playing with this some more, I am really starting to suspect a bug in numpy? Is that possible? Deleting any pair of 13 values from the dataset will fix the problem.
I am using np.polyfit()
to calculate the slope and intercept coefficients of a dataset. A plot of the values produces a very linear (but not perfectly) linear graph. I am attempting to get the standard deviation on these coefficients with np.sqrt(np.diag(cov))
; however, this throws an error because the diagonal contains negative values.
It should be mathematically impossible to produce a covariate matrix with a negative diagonal--what is numpy doing wrong?
Here is a snippet that reproduces the problem:
import numpy as np
x = [1476728821.797, 1476728821.904, 1476728821.911, 1476728821.920, 1476728822.031, 1476728822.039,
1476728822.047, 1476728822.153, 1476728822.162, 1476728822.171, 1476728822.280, 1476728822.289,
1476728822.297, 1476728822.407, 1476728822.416, 1476728822.423, 1476728822.530, 1476728822.539,
1476728822.547, 1476728822.657, 1476728822.666, 1476728822.674, 1476728822.759, 1476728822.788,
1476728822.797, 1476728822.805, 1476728822.915, 1476728822.923, 1476728822.931, 1476728823.038,
1476728823.047, 1476728823.054, 1476728823.165, 1476728823.175, 1476728823.182, 1476728823.292,
1476728823.300, 1476728823.308, 1476728823.415, 1476728823.424, 1476728823.432, 1476728823.551,
1476728823.559, 1476728823.567, 1476728823.678, 1476728823.689, 1476728823.697, 1476728823.808,
1476728823.828, 1476728823.837, 1476728823.947, 1476728823.956, 1476728823.964, 1476728824.074,
1476728824.083, 1476728824.091, 1476728824.201, 1476728824.209, 1476728824.217, 1476728824.324,
1476728824.333, 1476728824.341, 1476728824.451, 1476728824.460, 1476728824.468, 1476728824.579,
1476728824.590, 1476728824.598, 1476728824.721, 1476728824.730, 1476728824.788]
y = [6309927, 6310105, 6310116, 6310125, 6310299, 6310317, 6310326, 6310501, 6310513, 6310523, 6310688,
6310703, 6310712, 6310875, 6310891, 6310900, 6311058, 6311069, 6311079, 6311243, 6311261, 6311272,
6311414, 6311463, 6311479, 6311490, 6311665, 6311683, 6311692, 6311857, 6311867, 6311877, 6312037,
6312054, 6312065, 6312230, 6312248, 6312257, 6312430, 6312442, 6312455, 6312646, 6312665, 6312675,
6312860, 6312879, 6312894, 6313071, 6313103, 6313117, 6313287, 6313304, 6313315, 6313489, 6313505,
6313518, 6313675, 6313692, 6313701, 6313875, 6313888, 6313898, 6314076, 6314093, 6314104, 6314285,
6314306, 6314321, 6314526, 6314541, 6314638]
z, cov = np.polyfit(np.asarray(x), np.asarray(y), 1, cov=True)
std = np.sqrt(np.diag(cov))
print z
print cov
print std
回答1:
It looks like it's related to your x values: they have a total range of about 3, with an offset of about 1.5 billion.
In your code
np.asarray(x)
converts the x values in a ndarray of float64. While this is fine to correctly represent the x values themselves, it might not be enough to carry on the required computations to get the covariance matrix.
np.asarray(x, dtype=np.float128)
would solve the problem, but polyfit can't work with float128 :(
TypeError: array type float128 is unsupported in linalg
As a workaround, you can subtract the offset from x and then using polyfit. This produces a covariance matrix with positive diagonal:
x1 = x - np.mean(x)
z1, cov1 = np.polyfit(np.asarray(x1), np.asarray(y), 1, cov=True)
std1 = np.sqrt(np.diag(cov1))
print z1 # prints: array([ 1.56607841e+03, 6.31224162e+06])
print cov1 # prints: array([[ 4.56066546e+00, -2.90980285e-07],
# [ -2.90980285e-07, 3.36480951e+00]])
print std1 # prints: array([ 2.13557146, 1.83434171])
You'll have to rescale the results accordingly.
来源:https://stackoverflow.com/questions/40095325/covariance-matrix-from-np-polyfit-has-negative-diagonal