How to return alphabetical substrings?

问题

I'm trying to write a function that takes a string s as an input and returns a list of those substrings within s that are alphabetical. For example, s = 'acegibdh' should return ['acegi', 'bdh'].

Here's the code I've come up with:

s = 'acegibdh'
ans = []
subs = []
i = 0
while i != len(s) - 1:
    while s[i] < s[i+1]:
        subs.append(s[i])
        i += 1
    if s[i] > s[i-1]:
        subs.append(s[i])
        i += 1
    subs = ''.join(subs)
    ans.append(subs)
    subs = []
print ans

It keeps having trouble with the last letter of the string, because of the i+1 test going beyond the index range. I've spent a long time tinkering with it to try and come up with a way to avoid that problem. Does anyone know how to do this?

回答1:

Why not hard-code the first letter into ans, and then just work with the rest of the string? You can just iterate over the string itself instead of using indices.

>>> s = 'acegibdh'
>>> ans = []
>>> ans.append(s[0])
>>> for letter in s[1:]:
...     if letter >= ans[-1][-1]:
...             ans[-1] += letter
...     else:
...             ans.append(letter)
...
>>> ans
['acegi', 'bdh']

回答2:

s = 'acegibdh'
ans = []
subs = []
subs.append(s[0])
for x in range(len(s)-1):
    if s[x] <= s[x+1]:
        subs.append(s[x+1])
    if s[x] > s[x+1]:
        subs = ''.join(subs)
        ans.append(subs)
        subs = []
        subs.append(s[x+1])
subs = ''.join(subs)
ans.append(subs)
print ans

I decided to change your code a bit let me know if you have any questions

回答3:

Just for fun, a one line solution.

>>> s='acegibdh'
>>> [s[l:r] for l,r in (lambda seq:zip(seq,seq[1:]))([0]+[idx+1 for idx in range(len(s)-1) if s[idx]>s[idx+1]]+[len(s)])]
['acegi', 'bdh']

回答4:

You should try to avoid loops that increment the position by more than one char per iteration.

Often it is more clear to introduce an additional variable to store information about the previous state:

s = 'acegibdh'
prev = None
ans = []
subs = []
for ch in s:
    if prev is None or ch > prev:
        subs.append(ch)
    else:
        ans.append(''.join(subs))
        subs = [ch]
    prev = ch
ans.append(''.join(subs))

I think this read more straight forward (if there is no previous character or it's still alphabetical with the current char append, else start a new substring). Also you can't get index out of range problems with this approch.

回答5:

More than one while loop is overkill. I think this is simpler and satisfies your requirement. Note, this fails on empty string.

s = 'acegibdh'
ans = []
current = str(s[0])
i = 1
while i < len(s):
    if s[i] > s[i-1]:
        current += s[i]
    else: 
        ans.append(current)
        current = ''
    i += 1
if current != '':
   ans.append(current)
print ans

回答6:

just for fun cause I like doing things a little different sometimes

from itertools import groupby,chain,cycle

def my_gen(s):
    check = cycle([1,0])
    for k,v in groupby(zip(s,s[1:]),lambda x:x[0]<x[1]):
        if k:
            v = zip(*v)
            yield v[0] + (v[1][-1],)

print list(my_gen('acegibdhabcdefghijk'))

回答7:

Some of the solutions posted have an index error for empty strings.

Also, instead of keeping a list of characters, or doing repeated string concatenations, you can track the start index, i, of a solution substring and yield s[i:j] where s[j] < s[j-1], then set i to j.

Generator that yields substrings when the next letter is lexicographically less than the previous:

def alpha_subs(s):
    i, j = 0, 1
    while j < len(s):
        if s[j] < s[j-1]:
            yield s[i:j]
            i = j
        j += 1
    if s[i:j]:
        yield s[i:j]

print(list(alpha_subs('')))
print(list(alpha_subs('acegibdh')))
print(list(alpha_subs('acegibdha')))

[]
['acegi', 'bdh']
['acegi', 'bdh', 'a']

For case insensitivity:

def alpha_subs(s, ignore_case=False):
    qs = s.lower() if ignore_case else s
    i, j = 0, 1
    while j < len(s):
        if qs[j] < qs[j-1]:
            yield s[i:j]
            i = j
        j += 1
    if s[i:j]:
        yield s[i:j]

print(list(alpha_subs('acEgibDh', True)))
print(list(alpha_subs('acEgibDh')))

['acEgi', 'bDh']
['ac', 'Egi', 'b', 'Dh']

来源：https://stackoverflow.com/questions/30922659/how-to-return-alphabetical-substrings

标签

python

python-2.7

while-loop