问题
I need an algorithm of dividing different manufacturing parts in to uneven groups. The main condition is that difference between maximum number in the group and all others should be as low as possible. For
example:
if we have list [1,3,4,11,12,19,20,21] and we decide that it should be divided in 3 parts it should be divided into [1,3,4],[11,12],[19,20,21]. In the same case if we decide to divide it in to 4 we would get :
[1,3,4],[11],[12],[19,20,21].
In order to clarify "difference between maximum number in the group and all others" - [1,3,4] = 4 - 1 + 4 - 3 + 4 - 4 = 4,[11] = 11 - 11 = 0 ,[12,19] = 19 - 12 + 19 - 19 = 7 ,[20,21] = 21 -20 + 21 - 21 = 1. Total difference = 12. In the other possible case [1,3,4] = 4 - 1 + 4 - 3 + 4 - 4 = 4,[11,12,19] = 19 - 11 + 19 - 12 + 19 - 19 = 12,[20,21] = 21 - 20 + 21 - 21 = 0. Total difference = 16. This is calculation of over performance. This is due to the fact that larges number (representing for example strength) need to replace smallest number in the group (weakest). Using super strong part would be too expensive or heavy so optimization is needed.
So first I was thinking to slice the list in all possible combinations and then calculate the "difference between maximum number in the group and all others in the group". Then select as a final result the one with smallest minimum difference.
I was wondering if there is some build in function in python or Spyder or similar. If I need to write a code could you help me?
I'm trying to work on random list divided in to 10 in order to reapply it in different situations. l = sorted(random.sample(range(100), 10)).
回答1:
Based on your updated comments, it sounds like you are looking for the K-Means algorithm, or similar things, that will cluster your list elements into distinct groups based on their distance from proposed centers (this is what your difference calculation is really measuring).
In your criterion, note that it never makes sense to subtract the max of each subgroup from itself, since this is always zero by definition. So really you're looking at the sum of the max minus each element, over all non-max elements (what to do with duplicates is also a question you need to answer). K-Means will do something different (it will look at every point's distance from the average of the points), but in spirit it's the same. You can modify k-means to use your notion of a group score, although I don't really see any benefit to that in terms of the clustering output -- I'd need to see some kind of math proofs about the limiting behavior of the different criteria to be convinced that it matters.
You can achieve this reasonably easily with the sklearn and numpy modules:
from sklearn import cluster as cluster
import numpy as np
km = cluster.KMeans(n_clusters=4)
example_data = np.asarray([1,2,3, 11,12, 20,21,22, 30,35])[:,None]
km.fit(example_data)
Then look at km.labels_:
In [65]: km.labels_
Out[65]: array([0, 0, 0, 3, 3, 1, 1, 1, 2, 2], dtype=int32)
You can see that this would put together [1,2,3], [11, 12], [20, 21 , 22], [30, 35]. Below is some code that actually gets this for you:
In [74]: example_data.tolist()[0]
Out[74]: [1, 2, 3, 11, 12, 20, 21, 22, 30, 35]
In [75]: [[x for i,x in enumerate(example_data.tolist()[0]) if km.labels_[i] == j]
for j in range(km.n_clusters)]
Out[75]: [[1, 2, 3], [20, 21, 22], [30, 35], [11, 12]]
But note that this is not perfect: it is an iterative method not guaranteed to converge to any "true" solution, and for bizarre enough input data, you can get bizarre output.
Alternatively, a more basic understanding of what you want is to choose index integers i[0] through i[k], such that
sub_lists[j] = original_list[i[j]:i[j+1]]
with i[0]=0 and i[k+1] understood to mean "everything else in the list." Then define:
sub_lens = [len(s) for s in sub_lists]
max_len = max(sub_lens)
criterion(k, i[0], ..., i[k]) = max(max_len - s_len for s_len in sub_lens)
So a solution for you is a tuple of parameters, (k, i[0], ..., i[k]) and you want the choice that minimized the above expression criterion.
A generic solution for this problem is quite complicated. But if you're willing to accept a greedy solution that will be very balanced except for the final sublist, many of these solutions will do.
回答2:
As you don't mention whats the logic behind your slicing for beginning i suggest this function :
>>> def slicer(l,n):
... le=len(l)
... S=int(np.around(float(le)/n))
... return [l[i:i+S] for i in range(0,le,S)]
...
>>> slicer([1,3,4,11,12,19,20,21],2)
[[1, 3, 4, 11], [12, 19, 20, 21]]
>>> slicer([1,3,4,11,12,19,20,21],3)
[[1, 3, 4], [11, 12, 19], [20, 21]]
>>> slicer([1,3,4,11,12,19,20,21],4)
[[1, 3], [4, 11], [12, 19], [20, 21]]
Here i use numpy.around to round the float(le)/n for got a true slicing !
回答3:
Edit: based on clarified question, here is another algorithm. I still kept the original response below in case it's relevant.
You can solve the problem using dynamic programming. Note that the code below is not optimized for speed, because I thought that would make it too hard to understand. If you implement it carefully, you can do it in O(N * K), where N is the length of a and K is the number of sets to partition into.
a = [1,3,4,11,12,19,20,21]
S = []
K = 3
# memoize results in (len(a) + 1) by K array
memo_partitions = [[None for j in xrange(len(a) + 1)] for i in xrange(K + 1)]
def compute_cost(arr):
# this is the objective to be minimized
if len(arr) == 0:
return 0
return sum(arr[-1] - x for x in arr)
def compute_best_partition(k, n):
# computes the best partition of the first `n` elements of `a`
# into `k` parts
if n == 0:
return [[] for _ in xrange(k)], 0
if k == 1:
return [a[:n]], compute_cost(a[:n])
if memo_partitions[k][n] is not None:
return memo_partitions[k][n]
best_partition = [[] for _ in xrange(k - 1)] + [a[:n]]
best_cost = compute_cost(a[:n])
for i in xrange(1, n):
last_group = a[i:n]
additional_cost = compute_cost(last_group)
partition, cost = compute_best_partition(k - 1, i)
if cost + additional_cost < best_cost:
best_partition = partition[:]
best_partition.append(last_group)
best_cost = cost + additional_cost
memo_partitions[k][n] = (best_partition, best_cost)
return memo_partitions[k][n]
best_partition, cost = compute_best_partition(K, len(a))
print best_partition
Original response below.
Here are two approaches that might do what you want. Suppose your numbers are, in ascending order,
a[0], a[1], ... , a[n - 1]
Let max_diff(S) denote the maximum difference between two elements of a set S. We want to split up the numbers into sets S[0], ... , S[k - 1] such that the max_diff(S[i]) are small.
First, suppose we are trying to minimize the sum of the max_diff(S[i]). Notice that the sum of the max_diff(S[i]) is just a[n - 1] - a[0] minus the "gaps" between the S[i]. Thus, you can just find the k - 1 largest of the a[i + 1] - a[i] and exclude those. In python code,
a = [1,3,4,11,12,19,20,21]
S = []
k = 3
diffs = [(a[i + 1] - a[i], i) for i in xrange(len(a) - 1)]
diffs.sort()
best_cuts = [i for diff, i in diffs[-k:]]
best_cuts.sort()
last_cut = 0
for cut in best_cuts:
S.append(a[last_cut:cut + 1])
last_cut = cut + 1
S.append(a[last_cut:])
print S
Alternatively, suppose we are trying to minimize the maximum of the max_diff(S[i]). Then, we can do a binary search on the achievable value. In code,
a = [1,3,4,11,12,19,20,21]
S = []
k = 3
best_partition = None
low, high = 0, max(a)
while low < high:
mid = (low + high) / 2
# try to get all max_diffs <= mid
full_partition = []
last_set = [a[0]]
for val in a[1:]:
if val > last_set[0] + mid:
full_partition.append(last_set)
last_set = [val]
else:
last_set.append(val)
full_partition.append(last_set)
if len(full_partition) > k:
low = mid + 1
else:
high = mid
best_partition = full_partition
S = best_partition
print S
来源:https://stackoverflow.com/questions/26935861/splitting-up-a-list-into-different-length-parts-under-special-condition