Avoiding repeat of code after loop?

问题

I often end up writing a bit of code twice when using a loops. For example, while going over the Udacity computer science course, I wrote the code (for a function to find the most sequentially repeated element):

def longest_repetition(l):
    if not l:
        return None
    most_reps = count = 0 
    longest = prv = None
    for i in l:
        if i == prv:
            count += 1
        else:
            if count > most_reps:
                longest = prv
                most_reps = count
            count = 1
        prv = i
    if count > most_reps:
        longest = prv
    return longest

In this case, I'm checking twice if the count is greater than the previously most repeated element. This happens both when the current element is different from the last and when I've reached the end of the list.

I've also run into this a few times when parsing a string character by character. There have also been a few times where it's been up to about 5 lines of code. Is this common, or a result of the way I think/code. What should I do?

edit: Similarly, in a contrived string splitting example:

def split_by(string, delimeter):
    rtn = []
    tmp = ''
    for i in string:
        if i == delimeter:
            if tmp != '':
                rtn.append(tmp)
                tmp = ''
        else:
            tmp += i
    if tmp != '':
        rtn.append(tmp)
    return rtn

edit: The exam this was from was written for students of the course who are not expected to have any outside knowledge of Python; only what was taught in the previous units. Although I do have prior experience in Python, I'm trying to adhere to these restrictions to get the most of the course. Things like str.split, lists, and a lot of the fundamentals of Python were taught, but nothing yet on imports - especially not things like groupby. That being said, how should it be written without any of the language features that probably wouldn't be taught in a programming introduction course.

回答1:

Since you tagged language-agnostic, I see that you wont be much interested in python-specific stuff you could use to make your code efficient, compact and readable. For the same reason, I am not going to show how beautiful a code can be written in python.

In some of the cases that extra if at the end can be avoided depending on your algorithm, but most cases it's like "If it exists, it should be significant and/or efficient." I dont know about the how the python interpreter works, but in compiled languages like C/C++/etc. the compiler performs various kinds of loop optimisations, including moving the if-blocks out of a loop if it does the same thing.

I ran and compared the running time of various snippets:

@JFSebastian - 8.9939801693
@srgerg - 3.13302302361
yours - 2.8182990551.

It's not a generalisation that a trailing if gives you the best time. My point is: just follow your algorithm, and try to optimise it. There's nothing wrong with an if at the end. Probably alternative solutions are expensive.

About the second example you have put in: The check tmp == '' is done to ensure only non-empty strings are returned. That actually is a sort of additional condition over your splitting algorithm. In any case, you need an additional rtn.append after the loop because there's still something beyond the last delimiter. You could always push an if condition inside the loop like if curCharIndex == lastIndex: push items to list which will execute in every iteration, and its sort of the same case again.

My answer in short:

Your code is as efficient as your algorithm that you have in mind.
The ifs in the end are encountered in many cases -- no need to worry about them, they may be making the code more efficient than alternative approaches without such an if (examples are right here).
Additionally compilers can also spot and modify/move the blocks around your code.
If there's a language feature/library that makes your code fast and at the same time readable, use it. (Other answers here point out what python offers :))

回答2:

Have a look at the implementation of itertools.groupby which does almost exactly what you want. http://docs.python.org/library/itertools.html#itertools.groupby

Here is the algorithm using said code:

from itertools import groupby

string = "AAABBCCDDDD"

maximum = 0
max_char = ""

for i in groupby(string):
    x, xs = i
    n = len(list(xs))
    if n > maximum:
        max_char = x
        maximum = n

print max_char

My recommendation for thinking about writing algorithms like this in the future is to try not to do everything in one function. Think about the smaller functions that solve the problem you're trying to solve, such as "grouping each sequence of equal items in a sequence into smaller sequences".

Also of course it doesn't have to be characters in the above algorithm -- it could be anything that is groupable.

Edit: In response to the OP's edit, I figured you wouldn't be allowed to use/know about libraries like itertools in a class setting, but I wasn't suggesting that you should rely on external libraries, but more that you should think about problems by splitting them into smaller subproblems. So in this case you'd implement your own groupby and use it.

回答3:

Language-agnostic technique to avoid repeating a condition after a loop is to append sentinel values to the input data e.g., if delimiter appended to the end of string then the condition is not necessary in split_by(). Canonical example: in the linear search algorithm a needle can be appended to a haystack to avoid the end of sequence check.

Another option is to delegate some work to a separate function e.g., one function counts number of repetitions, the other finds maximum as in longest_repetition():

from itertools import groupby

def longest_repetition(iterable):
    return max(groupby(iterable), key=lambda x: sum(1 for _ in x[1]))[0]

If the repeated code is trivial; it might not be worth the effort.

回答4:

It is not uncommon to need to re-check a condition at the end of a loop that was also being checked inside the loop. If you're prepared to sacrifice a little efficiency, one way to avoid the repeated check is to over-check for it inside the loop. For example:

def my_longest_repetition(l):
    if not l:
        return None
    most_reps = count = 0
    longest = prv = None
    for i in l:
        count = (count + 1) if i == prv else 1
        if count > most_reps:
            longest = prv
            most_reps = count
        prv = i
    return longest

This code checks for count > most_reps more often than it needs to, but avoids the need to check it again after the loop.

Unfortunately this kind of change won't be applicable in all circumstances.

回答5:

I think there are three general approaches that could help you avoid repeating code at the end of the loop. For all three I'm going to use an example problem slightly different from your own, counting words in a string. Here's a "default" version that, like your code, repeats some logic at the end of the loop:

from collections import Counter

def countWords0(text):
    counts = Counter()
    word = ""

    for c in text.lower():
        if c not in "abcdefghijklmnopqrstuvwxyz'-":
            if word:
                counts[word] += 1
            word = ""
        else:
            word += c

    if word:
        counts[word] += 1 # repeated code at end of loop

    return counts

The first approach is to do (some of) the "end of subsequence" processing after every character, so that the bookkeeping is correct if the sequence ends immediately after that character. In your example, you could eliminate the "else" condition on your and run the code within it every time. (This is sergerg's answer.)

This may not be easy for some kinds of checks though. For counting words, you need to add some extra logic to avoid accumulating cruft from the "partial" subsequences you process. Here's code that does that:

def countWords1(text):
    counts = Counter()
    word = ""

    for c in text.lower():
        if c not in "abcdefghijklmnopqrstuvwxyz'-":
            word = ""
        else:
            if word:
                counts[word] -= 1 # new extra logic
            word += c
            counts[word] += 1 # this line was moved from above

    return counts + Counter() # more new stuff, to remove crufty zero-count items

The second option would be to append a sentinel value to the end of the sequence which will trigger the desired "end of subsequence" behavior. This can be tricky if you need to avoid the sentinel contaminating your data (especially for things like numbers). For your longest consecutive subsequence problem, you could add any value that is not equal to the last item in the sequence. None might be a good choice. For my counting words example, a non-word character (such as a newline) will do:

def countWords2(text):
    counts = Counter()
    word = ""

    for c in text.lower() + "\n": # NOTE: added a sentinel to the string!
        if c not in "abcdefghijklmnopqrstuvwxyz'-":
            if word:
                counts[word] += 1
            word = ""
        else:
            word += c

    # no need to recheck at the end, since we know we ended with a space

    return counts

The third approach is to change the structure of the code to avoid iterating over a sequence that might end unexpectedly. You might use generators to preprocess the sequence, as in the other answers that use groupby from itertools. (Of course, the generator functions, if you have to write them yourself, may have similar issues.)

For my word counting example, I can use regular expressions from the re module to find the words:

from re import finditer

def countWords3(text):
    return Counter(match.group() for match in
                   finditer("[\w'-]+", text.lower()))

Output, when given a suitably Pythonic text (it is the same for all four versions of countWords):

>>> text = """Well, there's egg and bacon; egg sausage and bacon;
              egg and spam; egg bacon and spam; egg bacon sausage and spam;
              spam bacon sausage and spam; spam egg spam spam bacon and spam;
              spam sausage spam spam bacon spam tomato and spam;
              spam spam spam egg and spam; spam spam spam spam spam spam
              baked beans spam spam spam; or Lobster Thermidor a Crevette
              with a mornay sauce served in a Provencale manner with shallots
              and aubergines garnished with truffle pate, brandy and with a
              fried egg on top and spam."""

>>> countWords0(text)
Counter({'spam': 28, 'and': 12, 'egg': 8, 'bacon': 7, 'sausage': 4, 'a': 4,
         'with': 4, 'well': 1, 'lobster': 1, 'manner': 1, 'in': 1, 'top': 1,
         'thermidor': 1, "there's": 1, 'truffle': 1, 'provencale': 1,
         'sauce': 1, 'brandy': 1, 'pate': 1, 'shallots': 1, 'garnished': 1,
         'tomato': 1, 'on': 1, 'baked': 1, 'aubergines': 1, 'mornay': 1,
         'beans': 1, 'served': 1, 'fried': 1, 'crevette': 1, 'or': 1})

回答6:

Iterators provide an nice way to break up loops:

def longest_repetition(l):
  i=iter(l)
  n=next(i,None)
  longest=None
  most_reps=0
  while n is not None:
    p=n
    count=0
    while p==n:
      n=next(i,None)
      count+=1
    if count>most_reps:
      most_reps=count
      longest=p
  return longest

Many languages have a similar concept.

来源：https://stackoverflow.com/questions/11149997/avoiding-repeat-of-code-after-loop

标签

python

language-agnostic

code-organization