Getting the lowest possible sum from numbers' difference

后端 未结 10 1320
滥情空心
滥情空心 2020-12-23 22:19

I have to find the lowest possible sum from numbers\' difference.

Let\'s say I have 4 numbers. 1515, 1520, 1500 and 1535. The lowest sum of difference is 30, becaus

10条回答
  •  北海茫月
    2020-12-23 22:46

    I've taken an approach which uses a recursive algorithm, but it does take some of what other people have contributed.

    First of all we sort the numbers:

    [1561,1572,1572,1609,1682,1731,1731,2041]
    

    Then we compute the differences, keeping track of which the indices of the numbers that contributed to each difference:

    [(11,(0,1)),(0,(1,2)),(37,(2,3)),(73,(3,4)),(49,(4,5)),(0,(5,6)),(310,(6,7))]
    

    So we got 11 by getting the difference between number at index 0 and number at index 1, 37 from the numbers at indices 2 & 3.

    I then sorted this list, so it tells me which pairs give me the smallest difference:

    [(0,(1,2)),(0,(5,6)),(11,(0,1)),(37,(2,3)),(49,(4,5)),(73,(3,4)),(310,(6,7))]
    

    What we can see here is that, given that we want to select n numbers, a naive solution might be to select the first n / 2 items of this list. The trouble is, in this list the third item shares an index with the first, so we'd only actually get 5 numbers, not 6. In this case you need to select the fourth pair as well to get a set of 6 numbers.

    From here, I came up with this algorithm. Throughout, there is a set of accepted indices which starts empty, and there's a number of numbers left to select n:

    1. If n is 0, we're done.
    2. if n is 1, and the first item will provide just 1 index which isn't in our set, we taken the first item, and we're done.
    3. if n is 2 or more, and the first item will provide 2 indices which aren't in our set, we taken the first item, and we recurse (e.g. goto 1). This time looking for n - 2 numbers that make the smallest difference in the remainder of the list.

    This is the basic routine, but life isn't that simple. There are cases we haven't covered yet, but make sure you get the idea before you move on.

    Actually step 3 is wrong (found that just before I posted this :-/), as it may be unnecessary to include an early difference to cover indices which are covered by later, essential differences. The first example ([1515, 1520, 1500, 1535]) falls foul of this. Because of this I've thrown it away in the section below, and expanded step 4 to deal with it.

    So, now we get to look at the special cases:

    1. ** as above **
    2. ** as above **
    3. If n is 1, but the first item will provide two indices, we can't select it. We have to throw that item away and recurse. This time we're still looking for n indices, and there have been no changes to our accepted set.
    4. If n is 2 or more, we have a choice. Either we can a) choose this item, and recurse looking for n - (1 or 2) indices, or b) skip this item, and recurse looking for n indices.

    4 is where it gets tricky, and where this routine turns into a search rather than just a sorting exercise. How can we decide which branch (a or b) to take? Well, we're recursive, so let's call both, and see which one is better. How will we judge them?

    • We'll want to take whichever branch produces the lowest sum.
    • ...but only if it will use up the right number of indices.

    So step 4 becomes something like this (pseudocode):

    x       = numberOfIndicesProvidedBy(currentDifference)
    branchA = findSmallestDifference (n-x, remainingDifferences) // recurse looking for **n-(1 or 2)**
    branchB = findSmallestDifference (n  , remainingDifferences) // recurse looking for **n** 
    sumA    = currentDifference + sumOf(branchA)
    sumB    =                     sumOf(branchB) 
    
    validA  = indicesAddedBy(branchA) == n
    validB  = indicesAddedBy(branchB) == n
    
    if not validA && not validB then return an empty branch
    
    if validA && not validB then return branchA
    if validB && not validA then return branchB
    
    // Here, both must be valid.
    if sumA <= sumB then return branchA else return branchB
    

    I coded this up in Haskell (because I'm trying to get good at it). I'm not sure about posting the whole thing, because it might be more confusing than useful, but here's the main part:

    findSmallestDifference = findSmallestDifference' Set.empty
    
    findSmallestDifference' _     _ [] = []
    findSmallestDifference' taken n (d:ds)
        | n == 0                = []    -- Case 1
        | n == 1 && provides1 d = [d]   -- Case 2
        | n == 1 && provides2 d = findSmallestDifference' taken n ds -- Case 3
        | provides0 d           = findSmallestDifference' taken n ds -- Case 3a (See Edit)
        | validA && not validB             = branchA -- Case 4
        | validB && not validA             = branchB -- Case 4
        | validA && validB && sumA <= sumB = branchA -- Case 4
        | validA && validB && sumB <= sumA = branchB -- Case 4
        | otherwise             = []                 -- Case 4
            where branchA = d : findSmallestDifference' (newTaken d) (n - (provides taken d)) ds
                  branchB = findSmallestDifference' taken n ds
                  sumA    = sumDifferences branchA
                  sumB    = sumDifferences branchB
                  validA  = n == (indicesTaken branchA)
                  validB  = n == (indicesTaken branchA)
                  newTaken x = insertIndices x taken 
    

    Hopefully you can see all the cases there. That code(-ish), plus some wrapper produces this:

    *Main> findLeastDiff 6 [1731, 1572, 2041, 1561, 1682, 1572, 1609, 1731]
    Smallest Difference found is 48
          1572 -   1572 =      0
          1731 -   1731 =      0
          1572 -   1561 =     11
          1609 -   1572 =     37
    *Main> findLeastDiff 4 [1515, 1520, 1500,1535]
    Smallest Difference found is 30
          1515 -   1500 =     15
          1535 -   1520 =     15
    

    This has become long, but I've tried to be explicit. Hopefully it was worth while.


    Edit : There is a case 3a that can be added to avoid some unnecessary work. If the current difference provides no additional indices, it can be skipped. This is taken care of in step 4 above, but there's no point in evaluating both halves of the tree for no gain. I've added this to the Haskell.

提交回复
热议问题