“Approximate” greatest common divisor

前端 未结 8 1105
清歌不尽
清歌不尽 2020-12-13 17:11

Suppose you have a list of floating point numbers that are approximately multiples of a common quantity, for example

2.468, 3.700, 6.1699

w

相关标签:
8条回答
  • 2020-12-13 17:51

    This reminds me of the problem of finding good rational-number approximations of real numbers. The standard technique is a continued-fraction expansion:

    def rationalizations(x):
        assert 0 <= x
        ix = int(x)
        yield ix, 1
        if x == ix: return
        for numer, denom in rationalizations(1.0/(x-ix)):
            yield denom + ix * numer, numer
    

    We could apply this directly to Jonathan Leffler's and Sparr's approach:

    >>> a, b, c = 2.468, 3.700, 6.1699
    >>> b/a, c/a
    (1.4991896272285252, 2.4999594813614263)
    >>> list(itertools.islice(rationalizations(b/a), 3))
    [(1, 1), (3, 2), (925, 617)]
    >>> list(itertools.islice(rationalizations(c/a), 3))
    [(2, 1), (5, 2), (30847, 12339)]
    

    picking off the first good-enough approximation from each sequence. (3/2 and 5/2 here.) Or instead of directly comparing 3.0/2.0 to 1.499189..., you could notice than 925/617 uses much larger integers than 3/2, making 3/2 an excellent place to stop.

    It shouldn't much matter which of the numbers you divide by. (Using a/b and c/b you get 2/3 and 5/3, for instance.) Once you have integer ratios, you could refine the implied estimate of the fundamental using shsmurfy's linear regression. Everybody wins!

    0 讨论(0)
  • 2020-12-13 17:53

    This is a reformulaiton of shsmurfy's solution when you a priori choose 3 positive tolerances (e1,e2,e3)
    The problem is then to search smallest positive integers (n1,n2,n3) and thus largest root frequency f such that:

    f1 = n1*f +/- e1
    f2 = n2*f +/- e2
    f3 = n3*f +/- e3
    

    We assume 0 <= f1 <= f2 <= f3
    If we fix n1, then we get these relations:

    f  is in interval I1=[(f1-e1)/n1 , (f1+e1)/n1]
    n2 is in interval I2=[n1*(f2-e2)/(f1+e1) , n1*(f2+e2)/(f1-e1)]
    n3 is in interval I3=[n1*(f3-e3)/(f1+e1) , n1*(f3+e3)/(f1-e1)]
    

    We start with n1 = 1, then increment n1 until the interval I2 and I3 contain an integer - that is floor(I2min) different from floor(I2max) same with I3
    We then choose smallest integer n2 in interval I2, and smallest integer n3 in interval I3.

    Assuming normal distribution of floating point errors, the most probable estimate of root frequency f is the one minimizing

    J = (f1/n1 - f)^2 + (f2/n2 - f)^2 + (f3/n3 - f)^2
    

    That is

    f = (f1/n1 + f2/n2 + f3/n3)/3
    

    If there are several integers n2,n3 in intervals I2,I3 we could also choose the pair that minimize the residue

    min(J)*3/2=(f1/n1)^2+(f2/n2)^2+(f3/n3)^2-(f1/n1)*(f2/n2)-(f1/n1)*(f3/n3)-(f2/n2)*(f3/n3)
    

    Another variant could be to continue iteration and try to minimize another criterium like min(J(n1))*n1, until f falls below a certain frequency (n1 reaches an upper limit)...

    0 讨论(0)
  • 2020-12-13 17:57

    Interesting question...not easy.

    I suppose I would look at the ratios of the sample values:

    • 3.700 / 2.468 = 1.499...
    • 6.1699 / 2.468 = 2.4999...
    • 6.1699 / 3.700 = 1.6675...

    And I'd then be looking for a simple ratio of integers in those results.

    • 1.499 ~= 3/2
    • 2.4999 ~= 5/2
    • 1.6675 ~= 5/3

    I haven't chased it through, but somewhere along the line, you decide that an error of 1:1000 or something is good enough, and you back-track to find the base approximate GCD.

    0 讨论(0)
  • 2020-12-13 18:01

    The solution which I've seen and used myself is to choose some constant, say 1000, multiply all numbers by this constant, round them to integers, find the GCD of these integers using the standard algorithm and then divide the result by the said constant (1000). The larger the constant, the higher the precision.

    0 讨论(0)
  • 2020-12-13 18:03

    Express your measurements as multiples of the lowest one. Thus your list becomes 1.00000, 1.49919, 2.49996. The fractional parts of these values will be very close to 1/Nths, for some value of N dictated by how close your lowest value is to the fundamental frequency. I would suggest looping through increasing N until you find a sufficiently refined match. In this case, for N=1 (that is, assuming X=2.468 is your fundamental frequency) you would find a standard deviation of 0.3333 (two of the three values are .5 off of X * 1), which is unacceptably high. For N=2 (that is, assuming 2.468/2 is your fundamental frequency) you would find a standard deviation of virtually zero (all three values are within .001 of a multiple of X/2), thus 2.468/2 is your approximate GCD.

    The major flaw in my plan is that it works best when the lowest measurement is the most accurate, which is likely not the case. This could be mitigated by performing the entire operation multiple times, discarding the lowest value on the list of measurements each time, then use the list of results of each pass to determine a more precise result. Another way to refine the results would be adjust the GCD to minimize the standard deviation between integer multiples of the GCD and the measured values.

    0 讨论(0)
  • 2020-12-13 18:07

    I'm assuming all of your numbers are multiples of integer values. For the rest of my explanation, A will denote the "root" frequency you are trying to find and B will be an array of the numbers you have to start with.

    What you are trying to do is superficially similar to linear regression. You are trying to find a linear model y=mx+b that minimizes the average distance between a linear model and a set of data. In your case, b=0, m is the root frequency, and y represents the given values. The biggest problem is that the independent variables X are not explicitly given. The only thing we know about X is that all of its members must be integers.

    Your first task is trying to determine these independent variables. The best method I can think of at the moment assumes that the given frequencies have nearly consecutive indexes (x_1=x_0+n). So B_0/B_1=(x_0)/(x_0+n) given a (hopefully) small integer n. You can then take advantage of the fact that x_0 = n/(B_1-B_0), start with n=1, and keep ratcheting it up until k-rnd(k) is within a certain threshold. After you have x_0 (the initial index), you can approximate the root frequency (A = B_0/x_0). Then you can approximate the other indexes by finding x_n = rnd(B_n/A). This method is not very robust and will probably fail if the error in the data is large.

    If you want a better approximation of the root frequency A, you can use linear regression to minimize the error of the linear model now that you have the corresponding dependent variables. The easiest method to do so uses least squares fitting. Wolfram's Mathworld has a in-depth mathematical treatment of the issue, but a fairly simple explanation can be found with some googling.

    0 讨论(0)
提交回复
热议问题