Split a list of numbers into n chunks such that the chunks have (close to) equal sums and keep the original order

◇◆丶佛笑我妖孽 提交于 2019-12-17 10:54:12

问题


This is not the standard partitioning problem, as I need to maintain the order of elements in the list.

So for example if I have a list

[1, 6, 2, 3, 4, 1, 7, 6, 4]

and I want two chunks, then the split should give

[[1, 6, 2, 3, 4, 1], [7, 6, 4]] 

for a sum of 17 on each side. For three chunks the result would be

[[1, 6, 2, 3], [4, 1, 7], [6, 4]]

for sums of 12, 12, and 10.

Edit for additional explanation

I currently divide the sum with the number of chunks and use that as a target, then iterate till I get close to that target. The problem is that certain data sets can mess the algorithm up, for example trying to divide the following into 3:-

[95, 15, 75, 25, 85, 5]

Sum is 300, target is 100. The first chunk would sum to 95, second would be sum to 90, third would sum to 110, and 5 would be 'leftover'. Appending it where it's supposed to be would give 95, 90, 115, where a more 'reasonable' solution would be 110, 100, 90.

end edit

Background:

I have a list containing text (song lyrics) of varying heights, and I want to divide the text into an arbitrary number of columns. Currently I calculate a target height based on the total height of all lines, but obviously this is a consistent underestimate, which in some cases results in a suboptimal solution (the last column is significantly taller).


回答1:


This approach defines partition boundaries that divide the array in roughly equal numbers of elements, and then repeatedly searches for better partitionings until it can't find any more. It differs from most of the other posted solutions in that it looks to find an optimal solution by trying multiple different partitionings. The other solutions attempt to create a good partition in a single pass through the array, but I can't think of a single pass algorithm that's guaranteed optimal.

The code here is an efficient implementation of this algorithm, but it can be hard to understand so a more readable version is included as an addendum at the end.

def partition_list(a, k):
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    partition_between = [(i+1)*len(a)/k for i in range(k-1)]
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0

    while True:
        starts = [0]+partition_between
        ends = partition_between+[len(a)]
        partitions = [a[starts[i]:ends[i]] for i in range(k)]
        heights = map(sum, partitions)

        abs_height_diffs = map(lambda x: abs(average_height - x), heights)
        worst_partition_index = abs_height_diffs.index(max(abs_height_diffs))
        worst_height_diff = average_height - heights[worst_partition_index]

        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1

        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1

        move = -1 if worst_height_diff < 0 else 1
        bound_to_move = 0 if worst_partition_index == 0\
                        else k-2 if worst_partition_index == k-1\
                        else worst_partition_index-1 if (worst_height_diff < 0) ^ (heights[worst_partition_index-1] > heights[worst_partition_index+1])\
                        else worst_partition_index
        direction = -1 if bound_to_move < worst_partition_index else 1
        partition_between[bound_to_move] += move * direction

def print_best_partition(a, k):
    print 'Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print 'The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)

There may be some modifications to make depending on what you are doing with this. For example, to determine whether the best partitioning has been found, this algorithm stops when there is no height difference among partitions, it doesn't find anything better than the best thing it's seen for more than 5 iterations in a row, or after 100 total iterations as a catch-all stopping point. You may need to adjust those constants or use a different scheme. If your heights form a complex landscape of values, knowing when to stop can get into classic problems of trying to escape local maxima and things like that.

Output

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 1 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1, 7, 6, 4]]
With heights [34]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 2 partitions
The best partitioning is [[1, 6, 2, 3, 4, 1], [7, 6, 4]]
With heights [17, 17]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 3 partitions
The best partitioning is [[1, 6, 2, 3], [4, 1, 7], [6, 4]]
With heights [12, 12, 10]

Partitioning [1, 6, 2, 3, 4, 1, 7, 6, 4] into 4 partitions
The best partitioning is [[1, 6], [2, 3, 4], [1, 7], [6, 4]]
With heights [7, 9, 8, 10]

Partitioning [1, 10, 10, 1] into 2 partitions
The best partitioning is [[1, 10], [10, 1]]
With heights [11, 11]

Partitioning [7, 17, 17, 1, 8, 8, 12, 0, 10, 20, 17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9, 12, 3, 18, 9, 6, 7, 19, 20, 17, 7, 4, 3, 16, 20, 6, 7, 12, 16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16, 14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5, 13, 16, 0, 16, 7, 3, 8, 1, 20, 16, 11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18, 20, 3, 10, 9, 13, 12, 15, 6, 14, 16, 6, 12, 9, 9, 16, 14, 19, 1] into 10 partitions
The best partitioning is [[7, 17, 17, 1, 8, 8, 12, 0, 10, 20], [17, 13, 12, 4, 1, 1, 7, 11, 7, 13, 9], [12, 3, 18, 9, 6, 7, 19, 20], [17, 7, 4, 3, 16, 20, 6, 7, 12], [16, 3, 6, 12, 9, 4, 3, 2, 18, 1, 16], [14, 17, 7, 0, 14, 13, 3, 5, 3, 1, 5, 5], [13, 16, 0, 16, 7, 3, 8, 1, 20, 16], [11, 15, 3, 10, 10, 2, 0, 12, 12, 0, 18], [20, 3, 10, 9, 13, 12, 15, 6, 14], [16, 6, 12, 9, 9, 16, 14, 19, 1]]
With heights [100, 95, 94, 92, 90, 87, 100, 93, 102, 102]

Partitioning [95, 15, 75, 25, 85, 5] into 3 partitions
The best partitioning is [[95, 15], [75, 25], [85, 5]]
With heights [110, 100, 90]

Edit

Added the new test case, [95, 15, 75, 25, 85, 5], which this method handles correctly.

Addendum

This version of the algorithm is easier to read and understand, but is a bit longer due to taking less advantage of built-in Python features. It seems to execute in a comparable or even slightly faster amount of time, however.

#partition list a into k partitions
def partition_list(a, k):
    #check degenerate conditions
    if k <= 1: return [a]
    if k >= len(a): return [[x] for x in a]
    #create a list of indexes to partition between, using the index on the
    #left of the partition to indicate where to partition
    #to start, roughly partition the array into equal groups of len(a)/k (note
    #that the last group may be a different size) 
    partition_between = []
    for i in range(k-1):
        partition_between.append((i+1)*len(a)/k)
    #the ideal size for all partitions is the total height of the list divided
    #by the number of paritions
    average_height = float(sum(a))/k
    best_score = None
    best_partitions = None
    count = 0
    no_improvements_count = 0
    #loop over possible partitionings
    while True:
        #partition the list
        partitions = []
        index = 0
        for div in partition_between:
            #create partitions based on partition_between
            partitions.append(a[index:div])
            index = div
        #append the last partition, which runs from the last partition divider
        #to the end of the list
        partitions.append(a[index:])
        #evaluate the partitioning
        worst_height_diff = 0
        worst_partition_index = -1
        for p in partitions:
            #compare the partition height to the ideal partition height
            height_diff = average_height - sum(p)
            #if it's the worst partition we've seen, update the variables that
            #track that
            if abs(height_diff) > abs(worst_height_diff):
                worst_height_diff = height_diff
                worst_partition_index = partitions.index(p)
        #if the worst partition from this run is still better than anything
        #we saw in previous iterations, update our best-ever variables
        if best_score is None or abs(worst_height_diff) < best_score:
            best_score = abs(worst_height_diff)
            best_partitions = partitions
            no_improvements_count = 0
        else:
            no_improvements_count += 1
        #decide if we're done: if all our partition heights are ideal, or if
        #we haven't seen improvement in >5 iterations, or we've tried 100
        #different partitionings
        #the criteria to exit are important for getting a good result with
        #complex data, and changing them is a good way to experiment with getting
        #improved results
        if worst_height_diff == 0 or no_improvements_count > 5 or count > 100:
            return best_partitions
        count += 1
        #adjust the partitioning of the worst partition to move it closer to the
        #ideal size. the overall goal is to take the worst partition and adjust
        #its size to try and make its height closer to the ideal. generally, if
        #the worst partition is too big, we want to shrink the worst partition
        #by moving one of its ends into the smaller of the two neighboring
        #partitions. if the worst partition is too small, we want to grow the
        #partition by expanding the partition towards the larger of the two
        #neighboring partitions
        if worst_partition_index == 0:   #the worst partition is the first one
            if worst_height_diff < 0: partition_between[0] -= 1   #partition too big, so make it smaller
            else: partition_between[0] += 1   #partition too small, so make it bigger
        elif worst_partition_index == len(partitions)-1: #the worst partition is the last one
            if worst_height_diff < 0: partition_between[-1] += 1   #partition too small, so make it bigger
            else: partition_between[-1] -= 1   #partition too big, so make it smaller
        else:   #the worst partition is in the middle somewhere
            left_bound = worst_partition_index - 1   #the divider before the partition
            right_bound = worst_partition_index   #the divider after the partition
            if worst_height_diff < 0:   #partition too big, so make it smaller
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]):   #the partition on the left is bigger than the one on the right, so make the one on the right bigger
                    partition_between[right_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the left bigger
                    partition_between[left_bound] += 1
            else:   #partition too small, make it bigger
                if sum(partitions[worst_partition_index-1]) > sum(partitions[worst_partition_index+1]): #the partition on the left is bigger than the one on the right, so make the one on the left smaller
                    partition_between[left_bound] -= 1
                else:   #the partition on the left is smaller than the one on the right, so make the one on the right smaller
                    partition_between[right_bound] += 1

def print_best_partition(a, k):
    #simple function to partition a list and print info
    print '    Partitioning {0} into {1} partitions'.format(a, k)
    p = partition_list(a, k)
    print '    The best partitioning is {0}\n    With heights {1}\n'.format(p, map(sum, p))

#tests
a = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print_best_partition(a, 1)
print_best_partition(a, 2) 
print_best_partition(a, 3)
print_best_partition(a, 4)
print_best_partition(a, 5)

b = [1, 10, 10, 1]
print_best_partition(b, 2)

import random
c = [random.randint(0,20) for x in range(100)]
print_best_partition(c, 10)

d = [95, 15, 75, 25, 85, 5]
print_best_partition(d, 3)



回答2:


Here's the best O(n) greedy algorithm I got for now. The idea is to greedily append items from the list to a chunk until the sum for the current chunk exceeds the average expected sum for a chunk at that point. The average expected sum is updated constantly. This solution is not perfect, but as I said, it is O(n) and worked not bad with my tests. I am eager to hear feedback and suggestions for improvement.

I left my debug print statements in the code to provide some documentation. Feel free to comment them in to see what's going on in each step.

CODE

def split_list(lst, chunks):
    #print(lst)
    #print()
    chunks_yielded = 0
    total_sum = sum(lst)
    avg_sum = total_sum/float(chunks)
    chunk = []
    chunksum = 0
    sum_of_seen = 0

    for i, item in enumerate(lst):
        #print('start of loop! chunk: {}, index: {}, item: {}, chunksum: {}'.format(chunk, i, item, chunksum))
        if chunks - chunks_yielded == 1:
            #print('must yield the rest of the list! chunks_yielded: {}'.format(chunks_yielded))
            yield chunk + lst[i:]
            raise StopIteration

        to_yield = chunks - chunks_yielded
        chunks_left = len(lst) - i
        if to_yield > chunks_left:
            #print('must yield remaining list in single item chunks! to_yield: {}, chunks_left: {}'.format(to_yield, chunks_left))
            if chunk:
                yield chunk
            yield from ([x] for x in lst[i:])
            raise StopIteration

        sum_of_seen += item
        if chunksum < avg_sum:
            #print('appending {} to chunk {}'.format(item, chunk))
            chunk.append(item)
            chunksum += item
        else:
            #print('yielding chunk {}'.format(chunk))
            yield chunk
            # update average expected sum, because the last yielded chunk was probably not perfect:
            avg_sum = (total_sum - sum_of_seen)/(to_yield - 1)
            chunks_yielded += 1
            chunksum = item
            chunk = [item]

TEST CODE

import random
lst = [1, 6, 2, 3, 4, 1, 7, 6, 4]
#lst = [random.choice(range(1,101)) for _ in range(100)]
chunks = 3
print('list: {}, avg sum: {}, chunks: {}\n'.format(lst, sum(lst)/float(chunks), chunks))
for chunk in split_list(lst, chunks):
    print('chunk: {}, sum: {}'.format(chunk, sum(chunk)))

TESTS with your list:

list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 17.0, chunks: 2

chunk: [1, 6, 2, 3, 4, 1], sum: 17
chunk: [7, 6, 4], sum: 17

---

list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 11.33, chunks: 3

chunk: [1, 6, 2, 3], sum: 12
chunk: [4, 1, 7], sum: 12
chunk: [6, 4], sum: 10

---

list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 8.5, chunks: 4

chunk: [1, 6, 2], sum: 9
chunk: [3, 4, 1], sum: 8
chunk: [7], sum: 7
chunk: [6, 4], sum: 10

---

list: [1, 6, 2, 3, 4, 1, 7, 6, 4], avg sum: 6.8, chunks: 5

chunk: [1, 6], sum: 7
chunk: [2, 3, 4], sum: 9
chunk: [1, 7], sum: 8
chunk: [6], sum: 6
chunk: [4], sum: 4

TESTS with random lists of length 100 and elements from 1 to 100 (printing of the random list omitted):

avg sum: 2776.0, chunks: 2

chunk: [25, 8, 71, 39, 5, 69, 29, 64, 31, 2, 90, 73, 72, 58, 52, 19, 64, 34, 16, 8, 16, 89, 70, 67, 63, 36, 9, 87, 38, 33, 22, 73, 66, 93, 46, 48, 65, 55, 81, 92, 69, 94, 43, 68, 98, 70, 28, 99, 92, 69, 24, 74], sum: 2806
chunk: [55, 55, 64, 93, 97, 53, 85, 100, 66, 61, 5, 98, 43, 74, 99, 56, 96, 74, 63, 6, 89, 82, 8, 25, 36, 68, 89, 84, 10, 46, 95, 41, 54, 39, 21, 24, 8, 82, 72, 51, 31, 48, 33, 77, 17, 69, 50, 54], sum: 2746

---

avg sum: 1047.6, chunks: 5

chunk: [19, 76, 96, 78, 12, 33, 94, 10, 38, 87, 44, 76, 28, 18, 26, 29, 44, 98, 44, 32, 80], sum: 1062
chunk: [48, 70, 42, 85, 87, 55, 44, 11, 50, 48, 47, 50, 1, 17, 93, 78, 25, 10, 89, 57, 85], sum: 1092
chunk: [30, 83, 99, 62, 48, 66, 65, 98, 94, 54, 14, 97, 58, 53, 3, 98], sum: 1022
chunk: [80, 34, 63, 20, 27, 36, 98, 97, 7, 6, 9, 65, 91, 93, 2, 27, 83, 35, 65, 17, 26, 41], sum: 1022
chunk: [80, 80, 42, 32, 44, 42, 94, 31, 50, 23, 34, 84, 47, 10, 54, 59, 72, 80, 6, 76], sum: 1040

---

avg sum: 474.6, chunks: 10

chunk: [4, 41, 47, 41, 32, 51, 81, 5, 3, 37, 40, 26, 10, 70], sum: 488
chunk: [54, 8, 91, 42, 35, 80, 13, 84, 14, 23, 59], sum: 503
chunk: [39, 4, 38, 40, 88, 69, 10, 19, 28, 97, 81], sum: 513
chunk: [19, 55, 21, 63, 99, 93, 39, 47, 29], sum: 465
chunk: [65, 88, 12, 94, 7, 47, 14, 55, 28, 9, 98], sum: 517
chunk: [19, 1, 98, 84, 92, 99, 11, 53], sum: 457
chunk: [85, 79, 69, 78, 44, 6, 19, 53], sum: 433
chunk: [59, 20, 64, 55, 2, 65, 44, 90, 37, 26], sum: 462
chunk: [78, 66, 32, 76, 59, 47, 82], sum: 440
chunk: [34, 56, 66, 27, 1, 100, 16, 5, 97, 33, 33], sum: 468

---

avg sum: 182.48, chunks: 25

chunk: [55, 6, 16, 42, 85], sum: 204
chunk: [30, 68, 3, 94], sum: 195
chunk: [68, 96, 23], sum: 187
chunk: [69, 19, 12, 97], sum: 197
chunk: [59, 88, 49], sum: 196
chunk: [1, 16, 13, 12, 61, 77], sum: 180
chunk: [49, 75, 44, 43], sum: 211
chunk: [34, 86, 9, 55], sum: 184
chunk: [25, 82, 12, 93], sum: 212
chunk: [32, 74, 53, 31], sum: 190
chunk: [13, 15, 26, 31, 35, 3, 14, 71], sum: 208
chunk: [81, 92], sum: 173
chunk: [94, 21, 34, 71], sum: 220
chunk: [1, 55, 70, 3, 92], sum: 221
chunk: [38, 59, 56, 57], sum: 210
chunk: [7, 20, 10, 81, 100], sum: 218
chunk: [5, 71, 19, 8, 82], sum: 185
chunk: [95, 14, 72], sum: 181
chunk: [2, 8, 4, 47, 75, 17], sum: 153
chunk: [56, 69, 42], sum: 167
chunk: [75, 45], sum: 120
chunk: [68, 60], sum: 128
chunk: [29, 25, 62, 3, 50], sum: 169
chunk: [54, 63], sum: 117
chunk: [57, 37, 42], sum: 136

As you can see, as expected it gets worse the more chunks you want to generate. I hope I was able to help a bit.

edit: The yield from syntax requires Python 3.3 or newer, if you are using an older version just turn the statement into a normal for loop.




回答3:


Simple and concise way using numpy. Assuming

import numpy.random as nr
import numpy as np

a = (nr.random(10000000)*1000).astype(int)

Then, assuming you need to divide the list into p parts with approximately equal sums

def equisum_partition(arr,p):
    ac = arr.cumsum()

    #sum of the entire array
    partsum = ac[-1]//p 

    #generates the cumulative sums of each part
    cumpartsums = np.array(range(1,p))*partsum

    #finds the indices where the cumulative sums are sandwiched
    inds = np.searchsorted(ac,cumpartsums) 

    #split into approximately equal-sum arrays
    parts = np.split(arr,inds)

    return parts

Importantly, this is vectorised:

In [3]: %timeit parts = equisum_partition(a,20)
53.5 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

You could checking the quality of the splitting,

partsums = np.array([part.sum() for part in parts]).std()

The splits are not great, but I suspect they are optimal given that the ordering is not changed.




回答4:


I think a good approach would be to sort the input list. Then add the smallest and largest to one list. The second smallest and second largest to the next list and so on, until all elements are added to the list.

def divide_list(A):
    A.sort()
    l = 0
    r = len(A) - 1
    l1,l2= [],[]
    i = 0
    while l < r:
        ends = [A[l], A[r]]
        if i %2 ==0:
            l1.extend(ends)
        else:
            l2.extend(ends)
        i +=1
        l +=1
        r -=1
    if r == l:
        smaller = l1 if sum(l1) < sum(l2) else l2
        smaller.append(A[r])

    return l1, l2

myList = [1, 6, 2, 3, 4, 1, 7, 6, 4]
print divide_list(myList)

myList = [1,10,10,1]
print divide_list(myList)

Output

([1, 7, 2, 6], [1, 6, 3, 4, 4])
([1, 10], [1, 10])



回答5:


This is coming kind of late but i came up with a function that does what you need it takes a second parameter that tells it how it should split the list

import math

my_list = [1, 6, 2, 3, 4, 1, 7, 6, 4]

def partition(my_list, split):
    solution = []

    total = sum(my_list)
    div = total / split
    div = math.ceil(div)

    criteria = [div] * (total // div)
    criteria.append(total - sum(criteria)) if sum(criteria) != total else criteria

    temp = []
    pivot = 0
    for crit in criteria:
        for count in range(len(my_list) + 1):
            if sum(my_list[pivot:count]) == crit:
                solution.append(my_list[pivot:count])
                pivot = count
                break



    return solution

print(partition(my_list, 2)) # Outputs [[1, 6, 2, 3, 4, 1], [7, 6, 4]]

print(partition(my_list, 3)) # Outputs [[1, 6, 2, 3], [4, 1, 7], [6, 4]]

it would fail for 4 divisions, because you obviously stated in your question that you want to maintain the order

4 divisions = [9, 9, 9, 7]

and your sequence can't match that




回答6:


Here is some code that returns 2-ples of slice indexes for each sublist.

weights = [1, 6, 2, 3, 4, 1, 7, 6, 4]

def balance_partitions(weights:list, n:int=2) -> tuple:
    if n < 1:
        raise ValueError("Parameter 'n' must be 2+")

    target = sum(weights) // n
    results = []
    cost = 0
    start = 0

    for i, w in enumerate(weights):
        delta = target - cost
        cost += w
        if cost >= target:
            if i == 0 or cost - target <= delta:
                results.append( (start, i+1) )
                start = i+1
            elif cost - target > delta:
                # Better if we didn't include this one.
                results.append( (start, i) )
                start = i

            cost -= target

            if len(results) == n-1:
                results.append( (start, len(weights)) )
                break

    return tuple(results)

def print_parts(w, n):
    result = balance_partitions(w, n)
    print("Suggested partition indices: ", result)
    for t in result:
        start,end = t
        sublist = w[start:end]
        print(" - ", sublist, "(sum: {})".format(sum(sublist)))

print(weights, '=', sum(weights))

for i in range(2, len(weights)+1):
    print_parts(weights, i)

Output is:

[1, 6, 2, 3, 4, 1, 7, 6, 4] = 34
Suggested partition indices:  ((0, 6), (6, 9))
 -  [1, 6, 2, 3, 4, 1] (sum: 17)
 -  [7, 6, 4] (sum: 17)
Suggested partition indices:  ((0, 4), (4, 7), (7, 9))
 -  [1, 6, 2, 3] (sum: 12)
 -  [4, 1, 7] (sum: 12)
 -  [6, 4] (sum: 10)
Suggested partition indices:  ((0, 3), (3, 5), (5, 7), (7, 9))
 -  [1, 6, 2] (sum: 9)
 -  [3, 4] (sum: 7)
 -  [1, 7] (sum: 8)
 -  [6, 4] (sum: 10)
Suggested partition indices:  ((0, 2), (2, 4), (4, 6), (6, 7), (7, 9))
 -  [1, 6] (sum: 7)
 -  [2, 3] (sum: 5)
 -  [4, 1] (sum: 5)
 -  [7] (sum: 7)
 -  [6, 4] (sum: 10)
Suggested partition indices:  ((0, 2), (2, 3), (3, 5), (5, 6), (6, 7), (7, 9))
 -  [1, 6] (sum: 7)
 -  [2] (sum: 2)
 -  [3, 4] (sum: 7)
 -  [1] (sum: 1)
 -  [7] (sum: 7)
 -  [6, 4] (sum: 10)
Suggested partition indices:  ((0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 9))
 -  [1, 6] (sum: 7)
 -  [2] (sum: 2)
 -  [3] (sum: 3)
 -  [4] (sum: 4)
 -  [1] (sum: 1)
 -  [7] (sum: 7)
 -  [6, 4] (sum: 10)
Suggested partition indices:  ((0, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9))
 -  [1, 6] (sum: 7)
 -  [2] (sum: 2)
 -  [3] (sum: 3)
 -  [4] (sum: 4)
 -  [1] (sum: 1)
 -  [7] (sum: 7)
 -  [6] (sum: 6)
 -  [4] (sum: 4)
Suggested partition indices:  ((0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9))
 -  [1] (sum: 1)
 -  [6] (sum: 6)
 -  [2] (sum: 2)
 -  [3] (sum: 3)
 -  [4] (sum: 4)
 -  [1] (sum: 1)
 -  [7] (sum: 7)
 -  [6] (sum: 6)
 -  [4] (sum: 4)



回答7:


Here's how I might attack this problem for the case of two desired sublists. It's probably not as efficient as it could be, but it's a first cut.

def divide(l):
    total = sum(l)
    half = total / 2
    l1 = []
    l2 = []
    for e in l:
        if half - e >= 0 or half > abs(half - e):
          l1.append(e)
          half -= e
        else:
          l2.append(e)
    return (l1, l2)

You can see it in action here:

(l1, l2) = divide([1, 6, 2, 3, 4, 1, 7, 6, 4])

print(l1)
# [1, 6, 2, 3, 4, 1]

print(l2)
#[7, 6, 4]

(l1, l2) = divide([1,1,10,10])

print(l1)
# [1, 1, 10]

print(l2)
#[10]

I'll leave other cases to you as an exercise. :)




回答8:


This is a minorly edited version of @Milind R's numpy-approach (BTW a big thanks, sir). Namely, I realized that in a real-life-scenario, the partitions suggested by the script may end up being sub-optimal, if the elements are not "uniformly" spread in the array in terms of their values. To counter this I "uniformified" the array by rearranging the elements of the as 'smallest', 'largest', 'second smallest', 'second largest', etc. The down part is that this makes the script considerably (~5x) slower.

import numpy.random as nr
import numpy as np
a = (nr.random(10000000)*1000).astype(int)

The edited partitioning algorithm:

def equisum_partition(arr,p, uniformify=True):

    #uniformify: rearrange to ['smallest', 'largest', 'second smallest', 'second largest', etc..]
    if uniformify:
        l = len(arr)
        odd = l%2!=0
        arr = np.sort(arr)

        #add a dummy element if odd length
        if odd:
            arr = np.append(np.min(arr)-1, arr)
            l = l+1

        idx = np.arange(l)
        idx = np.multiply(idx, 
                  np.subtract(1,
                              np.multiply(
                                  np.mod(idx, 2),
                                  2))
                 )
        arr = arr[idx]

        #remove the dummy element
        if odd:
            arr = arr[1:]

    #cumulative summation
    ac = arr.cumsum()

    #sum of the entire array
    partsum = ac[-1]//p 

    #generates the cumulative sums of each part
    cumpartsums = np.array(range(1,p))*partsum

    #finds the indices where the cumulative sums are sandwiched
    inds = np.searchsorted(ac,cumpartsums) 

    #split into approximately equal-sum arrays
    parts = np.split(arr,inds)

    return parts

In the original answer's example this doesn't play too much of a role since due the randomness of the example array.

With uniformify:

%%time
parts = equisum_partition(a,20)
partsums = np.array([part.sum() for part in parts])#
partsums.std()
Wall time: 624 ms
266.6111212984185

Without uniformify:

%%time
parts = equisum_partition(a,20, uniformify=False)
partsums = np.array([part.sum() for part in parts])#
partsums.std()
Wall time: 105 ms
331.19071544957296


来源:https://stackoverflow.com/questions/35517051/split-a-list-of-numbers-into-n-chunks-such-that-the-chunks-have-close-to-equal

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!