Efficiently delete arrays that are close from each other given a threshold in python

时间秒杀一切 提交于 2019-12-05 17:56:16

A generic approach might be:

def filter_quadratic(data,condition):
    result = []
    for element in data:
        if all(condition(element,other) for other in result):
            result.append(element)
    return result

This is a generic higher order filter that has a condition. Only if the condition is satisfied for all elements that are already in the list*, that element is added.

Now we still need to define the condition:

def the_condition(xs,ys):
    # working with squares, 2.5e-05 is 0.005*0.005 
    return sum((x-y)*(x-y) for x,y in zip(xs,ys)) > 2.5e-05

This gives:

>>> filter_quadratic([[ 5.024,  1.559,  0.281], [ 6.198,  4.827,  1.653], [ 6.199,  4.828,  1.653]],the_condition)
[[5.024, 1.559, 0.281], [6.198, 4.827, 1.653]]

The algorithm runs in O(n2) where n is the number of elements you give to the function. You can however make it a bit more efficient with k-d trees, but this requires some more advanced data structures.

If you can avoid having to compare each list element to every other one in a nested loop (which unavoidably is a O(n^2) operation) that would be much more efficient.

One approach is to generate a key such that two "almost duplicates" would produce the same key. Then you just iterate over your data once and only insert the values which are not already in your result set.

result = {}
for row in unique_cluster_centers:
    # round each value to 2 decimal places: 
    # [5.024,  1.559,  0.281] => (5.02,  1.56,  0.28)
    # you can be inventive and, say, multiply each value by 3 before rounding
    # if you want precision other than a whole decimal point.
    key = tuple([round(v, 2) for v in row])  # tuples can be keys of a dict
    if key not in result:
        result[key] = row
return result.values()  # I suppose the order of the items is not important, you can use OrderedDict otherwise  
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!