Count consecutive characters

后端 未结 9 1572
别那么骄傲
别那么骄傲 2020-11-28 06:44

How would I count consecutive characters in Python to see the number of times each unique digit repeats before the next unique digit?

At first, I thought I could do

9条回答
  •  挽巷
    挽巷 (楼主)
    2020-11-28 07:26

    Consecutive counts:

    Ooh nobody's posted itertools.groupby yet!

    s = "111000222334455555"
    
    from itertools import groupby
    
    groups = groupby(s)
    result = [(label, sum(1 for _ in group)) for label, group in groups]
    

    After which, result looks like:

    [("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]
    

    And you could format with something like:

    ", ".join("{}x{}".format(label, count) for label, count in result)
    # "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"
    

    Total counts:

    Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:

    from collections import Counter
    
    s = "11100111"
    result = Counter(s)
    # {"1":6, "0":2}
    

    Your method:

    As many have pointed out, your method fails because you're looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it's more pythonic to generate something to iterate over.

    For string that's not absolutely huge, zip(s, s[1:]) isn't a a performance issue, so you could do:

    counts = []
    count = 1
    for a, b in zip(s, s[1:]):
        if a==b:
            count += 1
        else:
            counts.append((a, count))
            count = 1
    

    The only problem being that you'll have to special-case the last character if it's unique. That can be fixed with itertools.zip_longest

    import itertools
    
    counts = []
    count = 1
    for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
        if a==b:
            count += 1
        else:
            counts.append((a, count))
            count = 1
    

    If you do have a truly huge string and can't stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.

    def pairwise(iterable):
        """iterates pairwise without holding an extra copy of iterable in memory"""
        a, b = itertools.tee(iterable)
        next(b, None)
        return itertools.zip_longest(a, b, fillvalue=None)
    
    counts = []
    count = 1
    for a, b in pairwise(s):
        ...
    

提交回复
热议问题