Efficient manipulation of a list of cartesian coordinates in Python

问题

Background:

I'm writing a program which handles large quantities of data related to the networks of vertices of various regular shapes. I have a working generator which produces a list of cartesian coordinates corresponding to the vertices of said shapes based on a range of user input parameters. The data is then passed to filters which clear up duplicate entries, sort the data and various other functions, from where the cleaned data is fed to a canvas module which loops through and draws the vertices.

Question:

I need to implement a new filter that efficiently loops through the coordinates, comparing each pair against every other pair, i.e. (x1,y1)->(x2,y2) to (x1,y1)->(xn,yn), (x2,y2)->(x3,y3) to (x2,y2)->(xn,yn) etc. for all entries and, for example, if the relationship between (x1,y1) and (x5,y5) fits [(x5-x1)^2+(y5-y1)^2]=vertex_spacing^2, then the two sets of coordinates are then paired with their respective list entry numbers and appended to a new list where one entry would be of the form: [(x1,y1), (x5,y5), 0, 4] for example. What is the most efficient method for achieving this?

My Attempts:

I've looked at quite a few methods for handling lists on here and on various guides. I've attempted nested 'for' and 'if' loops, but find while this method can work it leads to excessively long run times, as well as attempting to breaking the problem down into numerous smaller for loops.

Further Notes:

The ultimate aim of this is to use the resulting coordinates for front-end interface elements, and to be saved and imported as necessary. The function of the list positions 0 and 4 in [(x1,y1), (x5,y5), 0, 4] is to enable the interface to group coordinates for later use in canvas objects. The method should be able to process potentially thousands of coordinates.

Thank you in advance for any help, I am of course willing to improve the phrasing/information I've supplied and/or add example code if it is unclear what I am asking in any way- I'm still quite new to this! :)

回答1:

What you're basically checking is:

for each vertex v, find all vertices u such that u is on the circle of radius vertex_spacing around v.

If the distribution of your points is such that not all points are close together, I think of two approaches to speed up the search:

The simplest way to speed up this process would be to sort the points by x-coordinate. That way, you can possible skip many comparisons. As a simple example, assume that the x-coordinates are [1, 2, 10, 15, 18, 20, 21] and vertex_spacing = 5. You only need to compare the first vertex with the second one, because all the other vertices are obviously outside the circle around the first vertex.

Note that this approach is useless if all points are close together. In other words, if vertex_spacing = 25, you cannot skip any comparison.
Along the same lines, you could use a 2-dimensional k-d tree. This is equivalent to the sorting approach, but in two dimensions. Given a vertex (x, y) and vertex_spacing = v, you would have to check all points in the range ([x-v, x+v], [y-v, y+v]). Using the same example as before, assume the first point had coordinates (1, 0) and the second one (2, 10), there would be no need to compare the first vertex to anything.

Both approaches are heuristics and do not give any improvements on the worst-case running time (quite contrary: you also have the overhead of the sorting / building the k-d tree), but if the vertices are typically at least vertex_space apart, this could speed up your search a lot.

回答2:

I was too slow to beat Heuster's description of the algorithm, but here's an implementation of the sort-by-x-co-ordinate approach:

def pairs(coords, vertex_spacing):
    results = []
    vsquared = vertex_spacing * vertex_spacing
    coords = sorted(coords)
    for ia, (xa, ya) in enumerate(coords):
        for ib, (xb, yb) in enumerate(coords[ia:]):
            dx = xb - xa
            if dx > vertex_spacing:
                break
            dy = yb - ya
            if dx * dx + dy * dy == vsquared:
                results.append([(xa, ya), (xb, yb), ia, ia + ib])
    return results

... and here it is in action:

>>> coords = list((x, y) for x in range(100) for y in range(100))
>>> p = pairs(coords, 5)
>>> from random import choice
>>> choice(p)
[(93, 36), (96, 40), 9336, 9640]
>>> choice(p)
[(9, 57), (13, 54), 957, 1354]
>>> choice(p)
[(46, 69), (46, 74), 4669, 4674]

On my machine, pairs(coords, 5) takes 1.5 seconds to check 10,000 co-ordinate pairs (and 0.15 seconds to check 2,500).

EDIT: I forgot to add ia to ib to compensate for enumerating over a slice - fixed now.

回答3:

The slowest parts of your algorithm are the separate handling of the x and y coordinates and the hypotenuse calculation. Both of these can be sped-up by using Python's native complex number type:

>>> from itertools import starmap
>>> parray = list(starmap(complex, [(5, 1), (8.5, 3), (3.75, 4.25)]))
>>> a = parray[0]
>>> b = parray[1]
>>> a
(5+1j)
>>> b
(8.5+3j)
>>> a-b
(-3.5-2j)
>>> abs(a-b)
4.031128874149275

回答4:

One way to speed things up is to use some kind of spatial index, so that you rule out searching points that are obviously far apart. Here is a module that might be useful: http://toblerity.org/rtree/. See also http://en.wikipedia.org/wiki/R-tree.

来源：https://stackoverflow.com/questions/18291829/efficient-manipulation-of-a-list-of-cartesian-coordinates-in-python

标签

python

algorithm

list

python-2.7

coordinates