How to do an inverse `range`, i.e. create a compact range based on a set of numbers?

问题

Python has a range method, which allows for stuff like:

>>> range(1, 6)
[1, 2, 3, 4, 5]

What I’m looking for is kind of the opposite: take a list of numbers, and return the start and end.

>>> magic([1, 2, 3, 4, 5])
[1, 5] # note: 5, not 6; this differs from `range()`

This is easy enough to do for the above example, but is it possible to allow for gaps or multiple ranges as well, returning the range in a PCRE-like string format? Something like this:

>>> magic([1, 2, 4, 5])
['1-2', '4-5']
>>> magic([1, 2, 3, 4, 5])
['1-5']

Edit: I’m looking for a Python solution, but I welcome working examples in other languages as well. It’s more about figuring out an elegant, efficient algorithm. Bonus question: is there any programming language that has a built-in method for this?

回答1:

A nice trick to simplify the code is to look at the difference of each element of the sorted list and its index:

a = [4, 2, 1, 5]
a.sort()
print [x - i for i, x in enumerate(a)]

prints

[1, 1, 2, 2]

Each run of the same number corresponds to a run of consecutive numbers in a. We can now use itertools.groupby() to extract these runs. Here's the complete code:

from itertools import groupby

def sub(x):
    return x[1] - x[0]

a = [5, 3, 7, 4, 1, 2, 9, 10]
ranges = []
for k, iterable in groupby(enumerate(sorted(a)), sub):
     rng = list(iterable)
     if len(rng) == 1:
         s = str(rng[0][1])
     else:
         s = "%s-%s" % (rng[0][1], rng[-1][1])
     ranges.append(s)
print ranges

printing

['1-5', '7', '9-10']

回答2:

Sort numbers, find consecutive ranges (remember RLE compression?).

Something like this:

input = [5,7,9,8,6, 21,20, 3,2,1, 22,23, 50]

output = []
first = last = None # first and last number of current consecutive range
for item in sorted(input):
  if first is None:
    first = last = item # bootstrap
  elif item == last + 1: # consecutive
    last = item # extend the range
  else: # not consecutive
    output.append((first, last)) # pack up the range
    first = last = item
# the last range ended by iteration end
output.append((first, last))

print output

Result: [(1, 3), (5, 9), (20, 23), (50, 50)]. You figure out the rest :)

回答3:

I thought you might like my generalised clojure solution.

(def r [1 2 3 9 10])

(defn successive? [a b]
  (= a (dec b)))

(defn break-on [pred s]
  (reduce (fn [memo n]
            (if (empty? memo)
              [[n]]
              (if (pred (last (last memo)) n)
                (conj (vec (butlast memo))
                      (conj (last memo) n))
                (conj memo [n]))))
          []
          s))

(break-on successive? r)

回答4:

Since 9000 beat me to it, I'll just post the second part of the code, that prints pcre-like ranges from the previously computed output plus the added type check:

for i in output:
    if not isinstance(i, int) or i < 0:
        raise Exception("Only positive ints accepted in pcre_ranges")
result = [ str(x[0]) if x[0] == x[1] else '%s-%s' % (x[0], x[1]) for x in output ]
print result

Output: ['1-3', '5-9', '20-23', '50']

回答5:

Let's try generators!

# ignore duplicate values
l = sorted( set( [5,7,9,8,6, 21,20, 3,2,1, 22,23, 50] ) )

# get the value differences 
d = (i2-i1 for i1,i2 in zip(l,l[1:]))

# get the gap indices
gaps = (i for i,e in enumerate(d) if e != 1)

# get the range boundaries
def get_ranges(gaps, l):
  last_idx = -1
  for i in gaps:
    yield (last_idx+1, i)
    last_idx = i
  yield (last_idx+1,len(l)-1)

# make a list of strings in the requested format (thanks Frg!)
ranges = [ "%s-%s" % (l[i1],l[i2]) if i1!=i2 else str(l[i1]) \
  for i1,i2 in get_ranges(gaps, l) ]

This has become rather scary, I think :)

回答6:

This is kind of elegant but also kind of disgusting, depending on your point of view. :)

import itertools

def rangestr(iterable):
    end = start = iterable.next()
    for end in iterable:
        pass
    return "%s" % start if start == end else "%s-%s" % (start, end)

class Rememberer(object):
    last = None

class RangeFinder(object):
    def __init__(self):
        self.o = Rememberer()

    def __call__(self, x):
        if self.o.last is not None and x != self.o.last + 1:
            self.o = Rememberer()
        self.o.last = x
        return self.o

def magic(iterable):
    return [rangestr(vals) for k, vals in
            itertools.groupby(sorted(iterable), RangeFinder())]


>>> magic([5,7,9,8,6, 21,20, 3,2,1, 22,23, 50])
['1-3', '5-9', '20-23', '50']

Explanation: it uses itertools.groupby to group the sorted elements together by a key, where the key is a Rememberer object. The RangeFinder class keeps a Rememberer object as long as a consecutive bunch of items belongs to the same range block. Once you've passed out of a given block, it replaces the Rememberer so that the key won't compare equal and groupby will make a new group. As groupby walks over the sorted list, it passes the elements one-by-one into rangestr, which constructs the string by remembering the first and the last element and ignoring everything in between.

Is there any practical reason to use this instead of 9000's answer? Probably not; it's basically the same algorithm.

来源：https://stackoverflow.com/questions/9470611/how-to-do-an-inverse-range-i-e-create-a-compact-range-based-on-a-set-of-numb

标签

python

numbers

range

pcre