Find subset with K elements that are closest to eachother

后端未结

关注

 6  911

Given an array of integers size N, how can you efficiently find a subset of size K with elements that are closest to each other?

相关标签:

6条回答

情话喂你

2020-12-30 04:55
This procedure can be done with O(N*K) if A is sorted. If A is not sorted, then the time will be bounded by the sorting procedure.

This is based on 2 facts (relevant only when A is ordered):
- The closest subsets will always be subsequent
- When calculating the closeness of K subsequent elements, the sum of distances can be calculated as the sum of each two subsequent elements time (K-i)*i where i is 1,...,K-1.
- When iterating through the sorted array, it is redundant to recompute the entire sum, we can instead remove K times the distance between the previously two smallest elements, and add K times the distance of the two new largest elements. this fact is being used to calculate the closeness of a subset in O(1) by using the closeness of the previous subset.
Here's the pseudo-code
```
List<pair> FindClosestSubsets(int[] A, int K)
{
    List<pair> minList = new List<pair>;
    int minVal = infinity;
    int tempSum;
    int N = A.length;

    for (int i = K - 1; i < N; i++)
    {
        tempSum = 0;

        for (int j = i - K + 1; j <= i; j++)
              tempSum += (K-i)*i * (A[i] - A[i-1]);

        if (tempSum < minVal)
        {
              minVal = tempSum;
              minList.clear();
              minList.add(new pair(i-K, i);
        }

        else if (tempSum == minVal)
              minList.add(new pair(i-K, i);
    }

    return minList;
}
```
This function will return a list of pairs of indexes representing the optimal solutions (the starting and ending index of each solution), it was implied in the question that you want to return all solutions of the minimal value.
0 讨论(0)
发布评论:

提交评论
- 加载中...
被撕碎了的回忆

2020-12-30 04:55
After sorting, we can be sure that, if x1, x2, ... xk are the solution, then x1, x2, ... xk are contiguous elements, right?

So,
1. take the intervals between numbers
2. sum these intervals to get the intervals between k numbers
3. Choose the smallest of them
0 讨论(0)
发布评论:

提交评论
- 加载中...
遇见更好的自我

2020-12-30 05:01
Your current solution is O(NK^2) (assuming K > log N). With some analysis, I believe you can reduce this to O(NK).

The closest set of size K will consist of elements that are adjacent in the sorted list. You essentially have to first sort the array, so the subsequent analysis will assume that each sequence of K numbers is sorted, which allows the double sum to be simplified.

Assuming that the array is sorted such that x[j] >= x[i] when j > i, we can rewrite your closeness metric to eliminate the absolute value:

Next we rewrite your notation into a double summation with simple bounds:

Notice that we can rewrite the inner distance between x[i] and x[j] as a third summation:

where I've used d[l] to simplify the notation going forward:

Notice that d[l] is the distance between each adjacent element in the list. Look at the structure of the inner two summations for a fixed i:
```
j=i+1         d[i]
j=i+2         d[i] + d[i+1]
j=i+3         d[i] + d[i+1] + d[i+2]
...
j=K=i+(K-i)   d[i] + d[i+1] + d[i+2] + ... + d[K-1]
```
Notice the triangular structure of the inner two summations. This allows us to rewrite the inner two summations as a single summation in terms of the distances of adjacent terms:
```
total: (K-i)*d[i] + (K-i-1)*d[i+1] + ... + 2*d[K-2] + 1*d[K-1]
```
which reduces the total sum to:

Now we can look at the structure of this double summation:
```
i=1     (K-1)*d[1] + (K-2)*d[2] + (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
i=2                  (K-2)*d[2] + (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
i=3                               (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
...
i=K-2                                                2*d[K-2] + d[K-1]
i=K-1                                                           d[K-1]
```
Again, notice the triangular pattern. The total sum then becomes:
```
1*(K-1)*d[1] + 2*(K-2)*d[2] + 3*(K-3)*d[3] + ... + (K-2)*2*d[K-2] 
  + (K-1)*1*d[K-1]
```
Or, written as a single summation:

This compact single summation of adjacent differences is the basis for a more efficient algorithm:
1. Sort the array, order O(N log N)
2. Compute the differences of each adjacent element, order O(N)
3. Iterate over each N-K sequence of differences and calculate the above sum, order O(NK)
Note that the second and third step could be combined, although with Python your mileage may vary.

The code:
```
def closeness(diff,K):
  acc = 0.0
  for (i,v) in enumerate(diff):
    acc += (i+1)*(K-(i+1))*v
  return acc

def closest(a,K):
  a.sort()
  N = len(a)
  diff = [ a[i+1] - a[i] for i in xrange(N-1) ]

  min_ind = 0
  min_val = closeness(diff[0:K-1],K)

  for ind in xrange(1,N-K+1):
    cl = closeness(diff[ind:ind+K-1],K)
    if cl < min_val:
      min_ind = ind
      min_val = cl

  return a[min_ind:min_ind+K]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

刺人心

2020-12-30 05:10

try the following:

N = input()
K = input()
assert 2 <= N <= 10**5
assert 2 <= K <= N
a = some_unsorted_list
a.sort()

cur_diff = sum([abs(a[i] - a[i + 1]) for i in range(K - 1)])
min_diff = cur_diff
min_last_idx = K - 1
for last_idx in range(K,N):
    cur_diff = cur_diff - \
               abs(a[last_idx - K - 1] - a[last_idx - K] + \
               abs(a[last_idx] - a[last_idx - 1])
    if min_diff > cur_diff:
        min_diff = cur_diff
        min_last_idx = last_idx

From the min_last_idx, you can calculate the min_first_idx. I use range to preserve the order of idx. If this is python 2.7, it will take linearly more RAM. This is the same algorithm that you use, but slightly more efficient (smaller constant in complexity), as it does less then summing all.

0 讨论(0)

死守一世寂寞

2020-12-30 05:15
My initial solution was to look through all the K element window and multiply each element by m and take the sum in that range, where m is initialized by -(K-1) and incremented by 2 in each step and take the minimum sum from the entire list. So for a window of size 3, m is -2 and the values for the range will be -2 0 2. This is because I observed a property that each element in the K window add a certain weight to the sum. For an example if the elements are [10 20 30] the sum is (30-10) + (30-20) + (20-10). So if we break down the expression we have 2*30 + 0*20 + (-2)*10. This can be achieved in O(n) time and the entire operation would be in O(NK) time. However it turns out that this solution is not optimal, and there are certain edge cases where this algorithm fails. I am yet to figure out those cases, but shared the solution anyway if anyone can figure out something useful from it.
```
for(i = 0 ;i <= n - k;++i)
{
    diff = 0;
    l = -(k-1);
    for(j = i;j < i + k;++j)
    {
        diff += a[j]*l;
        if(min < diff)
            break;
        l += 2;
    }
    if(j == i + k && diff > 0)
    min = diff;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

悲哀的现实

2020-12-30 05:16

itertools to the rescue?

from itertools import combinations

def closest_elements(iterable, K):
    N = set(iterable)
    assert(2 <= K <= len(N) <= 10**5)

    combs = lambda it, k: combinations(it, k)
    _abs = lambda it: abs(it[0] - it[1])
    d = {}
    v = 0

    for x in combs(N, K):
        for y in combs(x, 2):
            v += _abs(y)

        d[x] = v
        v = 0

    return min(d, key=d.get)

>>> a = [10,100,300,200,1000,20,30]
>>> b = [1,2,3,4,10,20,30,40,100,200]
>>> print closest_elements(a, 3); closest_elements(b, 4)
(10, 20, 30) (1, 2, 3, 4)

0 讨论(0)