Match set of x,y points to another set that is scaled, rotated, translated, and with missing elements

问题

(Why am I doing this? See explanation below)

Consider two sets of points, A and B as shown below

It might not look like it, but set A is "hidden" within set B. It can not be easily seen because points in B are scaled, rotated, and translated in (x, y) with respect to A. Even worse, some points that are present in A are missing in B, and B contains many points that are not in A.

I need to find the appropriate scaling, rotation, and translation that must be applied to the B set in order to match it with set A. In the case shown above, the correct values are:

scale = 0.14, rot_angle = 0.0, x_transl = 35.0, y_transl = 2.0

which produces the (good enough) match

(in red, only the matched B points are shown; these are located in the sector 1000<x<2000, y~2000 in the first figure to the right). Given the many degrees of freedom (DoF: scaling + rotation + 2D translation) I am aware of the possibility of a miss-match, but the coordinates of the points are not random (although they may look like it) so the probability of this happening is very small.

The code I wrote (see below) uses brute force to loop through all possible DoF values taken from pre-defined ranges for each one. The core of the code is based on minimizing the distance of each point in A to any point in B

The code works (it actually generated the solution mentioned above), but since the number of solutions (i.e., the combinations of accepted values for each DoF) scales with larger ranges, it can become unacceptably slow pretty quick (also it eats up all the RAM in my system)

How can I improve the performance of the code? I'm open to any solution including numpy and/or scipy. Perhaps something like Basing-Hopping to search for the best match (or a relatively close one) instead of the brute force method I currently employ?

import numpy as np
from scipy.spatial import distance
import math


def scalePoints(B_center, delta_x, delta_y, scale):
    """
    Scales xy points.

    http://codereview.stackexchange.com/q/159183/35351
    """
    x_scale = B_center[0] - scale * delta_x
    y_scale = B_center[1] - scale * delta_y

    return x_scale, y_scale


def rotatePoints(center, x, y, angle):
    """
    Rotates points in 'xy' around 'center'. Angle is in degrees.
    Rotation is counter-clockwise

    http://stackoverflow.com/a/20024348/1391441
    """
    angle = math.radians(angle)
    xy_rot = x - center[0], y - center[1]
    xy_rot = (xy_rot[0] * math.cos(angle) - xy_rot[1] * math.sin(angle),
              xy_rot[0] * math.sin(angle) + xy_rot[1] * math.cos(angle))
    xy_rot = xy_rot[0] + center[0], xy_rot[1] + center[1]

    return xy_rot


def distancePoints(set_A, x_transl, y_transl):
    """
    Find the sum of the minimum distance of points in set_A to points in set_B.
    """
    d = distance.cdist(set_A, zip(*[x_transl, y_transl]), 'euclidean')
    # Sum of all minimal distances.
    d_sum = np.sum(np.min(d, axis=1))

    return d_sum


def match_frames(
        set_A, B_center, delta_x, delta_y, tol, sc_min, sc_max, sc_step,
        ang_min, ang_max, ang_step, xmin, xmax, xstep, ymin, ymax, ystep):
    """
    Process all possible solutions in the defined ranges.
    """
    N = len(set_A)
    # Ranges
    sc_range = np.arange(sc_min, sc_max, sc_step)
    ang_range = np.arange(ang_min, ang_max, ang_step)
    x_range = np.arange(xmin, xmax, xstep)
    y_range = np.arange(ymin, ymax, ystep)
    print("Total solutions: {:.2e}".format(
          np.prod([len(_) for _ in [sc_range, ang_range, x_range, y_range]])))

    d_sum, params_all = [], []
    for scale in sc_range:
        # Scaled points.
        x_scale, y_scale = scalePoints(B_center, delta_x, delta_y, scale)
        for ang in ang_range:
            # Rotated points.
            xy_rot = rotatePoints(B_center, x_scale, y_scale, ang)
            # x translation
            for x_tr in x_range:
                x_transl = xy_rot[0] + x_tr
                # y translation
                for y_tr in y_range:
                    y_transl = xy_rot[1] + y_tr

                    # Find minimum distance sum.
                    d_sum.append(distancePoints(set_A, x_transl, y_transl))

                    # Store solutions.
                    params_all.append([scale, ang, x_tr, y_tr])

                    # Condition to break out if given tolerance for match
                    # is achieved.
                    if d_sum[-1] <= tol * N:
                        print("Match found:", scale, ang, x_tr, y_tr)
                        i_min = d_sum.index(min(d_sum))
                        return i_min, params_all

        # Print best solution found so far.
        i_min = d_sum.index(min(d_sum))
        print("d_sum_min = {:.2f}".format(d_sum[i_min]))

    return i_min, params_all


# Data.
set_A = [[2015.81, 1981.665], [1967.31, 1960.865], [1962.91, 1951.365],
         [1964.91, 1994.565], [1894.41, 1957.065]]
set_B = [
    [2689.28, 3507.04, 2895.67, 1051.3, 1929.49, 1035.97, 752.44, 130.62,
     620.06, 2769.06, 1580.77, 281.76, 224.54, 3848.3],
    [2061.19, 3700.27, 2131.2, 1837.3, 2017.52, 80.96, 3524.61, 3821.22,
     3711.53, 1812.12, 1868.33, 3865.77, 3273.77, 2100.71]]

# This is necessary to apply the scaling.
x, y = np.asarray(set_B)
# Center of B points, defined as the center of the minimal rectangle that
# contains all points.
B_center = [(min(x) + max(x)) * .5, (min(y) + max(y)) * .5]
# Difference between the center coordinates and the xy points.
delta_x, delta_y = B_center[0] - x, B_center[1] - y

# Tolerance in pixels for match.
tol = 1.
# Ranges for each DoF.
sc_min, sc_max, sc_step = .01, .2, .01
ang_min, ang_max, ang_step = -30., 30., 1.
xmin, xmax, xstep = -150., 150., 1.
ymin, ymax, ystep = -150., 150., 1.

# Find proper scaling + rotation + translation for set_B.
i_min, params_all = match_frames(
    set_A, B_center, delta_x, delta_y, tol, sc_min, sc_max, sc_step,
    ang_min, ang_max, ang_step, xmin, xmax, xstep, ymin, ymax, ystep)

# Best match found
print(params_all[i_min])

Why am I doing this?

When an astronomer observes a star field, it also needs to observe what is called a "standard field of stars". This is needed to be able to transform the "instrumental magnitudes" (a logarithmic measure of the brightness) of the observed stars to a common universal scale, since these magnitudes depend on the optical system used (telescope + CCD array). In the example shown here, the standard field can be seen below to the left, and the observed one to the right.

Notice that the points in the set A (used above) are the marked stars in the standard field, and the set B are those stars detected in the observed field (marked in red above)

The process of identifying those stars in the observed field that correspond to the stars marked in the standard field is done by-eye, even today. This is due to the complexity of the task.

In the observed image above, there's quite a bit of scaling, but no rotation and little translation. This constitutes a rather favorable scenario; it could be much worse. I'm trying to develop a simple algorithm to avoid having to manually identify stars in the observed field as stars in the standard field, one by one.

Solution proposed by litepresence

This is a script I made following the answer by litepresence.

import itertools
import numpy as np
import matplotlib.pyplot as plt


def getTriangles(set_X, X_combs):
    """
    Inefficient way of obtaining the lengths of each triangle's side.
    Normalized so that the minimum length is 1.
    """
    triang = []
    for p0, p1, p2 in X_combs:
        d1 = np.sqrt((set_X[p0][0] - set_X[p1][0]) ** 2 +
                     (set_X[p0][1] - set_X[p1][1]) ** 2)
        d2 = np.sqrt((set_X[p0][0] - set_X[p2][0]) ** 2 +
                     (set_X[p0][1] - set_X[p2][1]) ** 2)
        d3 = np.sqrt((set_X[p1][0] - set_X[p2][0]) ** 2 +
                     (set_X[p1][1] - set_X[p2][1]) ** 2)
        d_min = min(d1, d2, d3)
        d_unsort = [d1 / d_min, d2 / d_min, d3 / d_min]
        triang.append(sorted(d_unsort))

    return triang


def sumTriangles(A_triang, B_triang):
    """
    For each normalized triangle in A, compare with each normalized triangle
    in B. find the differences between their sides, sum their absolute values,
    and select the two triangles with the smallest sum of absolute differences.
    """
    tr_sum, tr_idx = [], []
    for i, A_tr in enumerate(A_triang):
        for j, B_tr in enumerate(B_triang):
            # Absolute value of lengths differences.
            tr_diff = abs(np.array(A_tr) - np.array(B_tr))
            # Sum the differences
            tr_sum.append(sum(tr_diff))
            tr_idx.append([i, j])

    # Index of the triangles in A and B with the smallest sum of absolute
    # length differences.
    tr_idx_min = tr_idx[tr_sum.index(min(tr_sum))]
    A_idx, B_idx = tr_idx_min[0], tr_idx_min[1]
    print("Smallest difference: {}".format(min(tr_sum)))

    return A_idx, B_idx


# Data.
set_A = [[2015.81, 1981.665], [1967.31, 1960.865], [1962.91, 1951.365],
         [1964.91, 1994.565], [1894.41, 1957.065]]
set_B = [
    [2689.28, 3507.04, 2895.67, 1051.3, 1929.49, 1035.97, 752.44, 130.62,
     620.06, 2769.06, 1580.77, 281.76, 224.54, 3848.3],
    [2061.19, 3700.27, 2131.2, 1837.3, 2017.52, 80.96, 3524.61, 3821.22,
     3711.53, 1812.12, 1868.33, 3865.77, 3273.77, 2100.71]]
set_B = zip(*set_B)

# All possible triangles.
A_combs = list(itertools.combinations(range(len(set_A)), 3))
B_combs = list(itertools.combinations(range(len(set_B)), 3))

# Obtain normalized triangles.
A_triang, B_triang = getTriangles(set_A, A_combs), getTriangles(set_B, B_combs)

# Index of the A and B triangles with the smallest difference.
A_idx, B_idx = sumTriangles(A_triang, B_triang)

# Indexes of points in A and B of the best match triangles.
A_idx_pts, B_idx_pts = A_combs[A_idx], B_combs[B_idx]
print 'triangle A %s matches triangle B %s' % (A_idx_pts, B_idx_pts)

# Matched points in A and B.
print "A:", [set_A[_] for _ in A_idx_pts]
print "B:", [set_B[_] for _ in B_idx_pts]

# Plot
A_pts = zip(*[set_A[_] for _ in A_idx_pts])
B_pts = zip(*[set_B[_] for _ in B_idx_pts])
plt.scatter(*A_pts, s=10, c='k')
plt.scatter(*B_pts, s=10, c='r')
plt.show()

The method is almost instantaneous and produces the correct match:

Smallest difference: 0.0314154749597
triangle A (0, 1, 4) matches triangle B (3, 4, 10)
A: [[2015.81, 1981.665], [1967.31, 1960.865], [1894.41, 1957.065]]
B: [(1051.3, 1837.3), (1929.49, 2017.52), (1580.77, 1868.33)]

回答1:

1) I would approach this problem by labeling all points and finding all possible combinations of 3 points from each set.

# normalize B data to same format as A
set_Bx, set_By = (set_B)
set_B = []
for i in range(len(set_Bx)):
    set_B.append([set_Bx[i],set_By[i]])
'''
set_B = [[2689.28, 2061.19], [3507.04, 3700.27], [2895.67, 2131.2], 
[1051.3, 1837.3], [1929.49, 2017.52], [1035.97, 80.96], [752.44, 
3524.61], [130.62, 3821.22], [620.06, 3711.53], [2769.06, 1812.12], 
[1580.77, 1868.33], [281.76, 3865.77], [224.54, 3273.77], [3848.3, 
2100.71]]
'''

list(itertools.combinations(range(len(set_A)), 3))
list(itertools.combinations(range(len(set_B)), 3))

How to generate all permutations of a list in Python

2) For each 3 point group, calculate the sides of the respective triangle; repeating the process for group A and group B.

dist = sqrt( (x2 - x1)**2 + (y2 - y1)**2 )

How do I find the distance between two points?

3) Then reduce the ratio of sides for each such that each triangle's smallest side was then equal to 1; the other sides reduced in respective ratio.

In two similar triangles:

The perimeters of the two triangles are in the same ratio as the sides. The corresponding sides, medians and altitudes will all be in this same ratio.

http://www.mathopenref.com/similartrianglesparts.html

4) Finally for each a triangle from group A, compare to each triangle from group B, with element-wise subtraction. Then sum the resulting elements and find the triangles from A and B with the smallest sum.

list(numpy.array(list1)-numpy.array(list2))

Subtracting 2 lists in Python

5) Given matched triangles; finding the appropriate scaling, translation, and rotation should be relatively trivial in terms of CPU/RAM.

ETA1: rough draft script

ETA2: patched error discussed in comments: with sum(abs()) instead of abs(sum()). Now it works, fast too!

'''
known correct solution

A = [[1894.41, 1957.065],[1967.31, 1960.865],[2015.81, 1981.665]]
B = [[1051.3, 1837.3],[1580.77, 1868.33],[1929.49, 2017.52]]

'''
import numpy as np
import itertools
import math
import operator

set_A = [[2015.81, 1981.665], [1967.31, 1960.865], [1962.91, 1951.365],
        [1964.91, 1994.565], [1894.41, 1957.065]]
set_B = [[2689.28, 3507.04, 2895.67, 1051.3, 1929.49, 1035.97, 752.44, 130.62,
        620.06, 2769.06, 1580.77, 281.76, 224.54, 3848.3],
        [2061.19, 3700.27, 2131.2, 1837.3, 2017.52, 80.96, 3524.61, 3821.22,
        3711.53, 1812.12, 1868.33, 3865.77, 3273.77, 2100.71]]

# normalize set B data to set A format
set_Bx, set_By = (set_B)
set_B = []
for i in range(len(set_Bx)):
    set_B.append([set_Bx[i],set_By[i]])

'''
set_B = [[2689.28, 2061.19], [3507.04, 3700.27], [2895.67, 2131.2], 
[1051.3, 1837.3], [1929.49, 2017.52], [1035.97, 80.96], [752.44, 3524.61], 
[130.62, 3821.22], [620.06, 3711.53], [2769.06, 1812.12], [1580.77, 1868.33], 
[281.76, 3865.77], [224.54, 3273.77], [3848.3, 2100.71]]
'''

print set_A
print set_B
print len(set_A)
print len(set_B)

set_A_tri = list(itertools.combinations(range(len(set_A)), 3))
set_B_tri = list(itertools.combinations(range(len(set_B)), 3))

print set_A_tri
print set_B_tri
print len(set_A_tri)
print len(set_B_tri)

'''
set_A = [[2015.81, 1981.665], [1967.31, 1960.865], [1962.91, 1951.365], 
[1964.91, 1994.565], [1894.41, 1957.065]]

set_A_tri = [(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (0, 2, 4), (0, 3, 4), 
(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)]
'''

def distance(x1,y1,x2,y2):
    return math.sqrt((x2 - x1)**2 + (y2 - y1)**2 )

def tri_sides(set_x, set_x_tri):

    triangles = []
    for i in range(len(set_x_tri)):

        point1 = set_x_tri[i][0]
        point2 = set_x_tri[i][1]
        point3 = set_x_tri[i][2]

        point1x, point1y = set_x[point1][0], set_x[point1][1]
        point2x, point2y = set_x[point2][0], set_x[point2][1]
        point3x, point3y = set_x[point3][0], set_x[point3][1] 

        len1 = distance(point1x,point1y,point2x,point2y)
        len2 = distance(point1x,point1y,point3x,point3y)
        len3 = distance(point2x,point2y,point3x,point3y)

        min_side = min(len1,len2,len3)
        len1/=min_side
        len2/=min_side
        len3/=min_side
        t=[len1,len2,len3]
        t.sort()
        triangles.append(t)

    return triangles

A_triangles = tri_sides(set_A, set_A_tri)
B_triangles = tri_sides(set_B, set_B_tri)

print A_triangles
'''
[[1.0, 5.0405616860744304, 5.822935502560814], 
[1.0, 1.5542012854321234, 1.5619803879976761], 
[1.0, 1.3832883678507584, 2.347214708755337], 
[1.0, 1.2141910838179129, 1.4096730529373076], 
[1.0, 1.1275138587537166, 2.0318412465223665], 
[1.0, 1.5207417600732074, 2.3589630093994876], 
[1.0, 3.2270326342163584, 4.13069930678442], 
[1.0, 6.565440477766354, 6.972550347780966], 
[1.0, 2.1606693015281997, 2.3635387983160885], 
[1.0, 1.589425903498476, 1.846471085870448]]
'''
print B_triangles

def list_subtract(list1,list2):

    return np.absolute(np.array(list1)-np.array(list2))

sums = []
threshold = 1
for i in range(len(A_triangles)):
    for j in range(len(B_triangles)):
        k = sum(list_subtract(A_triangles[i], B_triangles[j]))
        if k < threshold:
            sums.append([i,j,k])
# sort by smallest sum
sums = sorted(sums, key=operator.itemgetter(2))

print sums
print 'winner %s' % sums[0]
print sums[0][0]
print sums[0][1]
match_A = set_A_tri[sums[0][0]]
match_B = set_B_tri[sums[0][1]]
print 'triangle A %s matches triangle B %s' % (match_A, match_B)

match_A_pts = []
match_B_pts = []
for i in range(3):
    match_A_pts.append(set_A[match_A[i]])
    match_B_pts.append(set_B[match_B[i]])

print 'triangle A has points %s' % match_A_pts
print 'triangle B has points %s' % match_B_pts

'''
winner [2, 204, 0.031415474959738399]
2
204
triangle A (0, 1, 4) matches triangle B (3, 4, 10)
triangle A has points [[2015.81, 1981.665], [1967.31, 1960.865], [1894.41, 1957.065]]
triangle B has points [[1051.3, 1837.3], [1929.49, 2017.52], [1580.77, 1868.33]]
'''

回答2:

There is an algorithm called Multi Dimensional Scaling, or MDS (http://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html) that finds the scale of such transforms. It's closely related to principal component analysis, but uses a vector of linear dissimilarities instead of covariances (which are a kind of squared dissimilarity).

To recover the rotation and offset, you can use RANSAC (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RANSACRegressor.html).

来源：https://stackoverflow.com/questions/43126580/match-set-of-x-y-points-to-another-set-that-is-scaled-rotated-translated-and

标签

python

performance

geometry

affinetransform

astronomy