Python: re..find longest sequence

后端 未结 5 680
清歌不尽
清歌不尽 2021-01-06 07:28

I have a string that is randomly generated:

polymer_str = \"diol diNCO diamine diNCO diamine diNCO diamine diNCO diol diNCO diamine\"

I\'d

5条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-06 08:00

    Expanding on Ealdwulf's answer:

    Documentation on re.findall can be found here.

    def getLongestSequenceSize(search_str, polymer_str):
        matches = re.findall(r'(?:\b%s\b\s?)+' % search_str, polymer_str)
        longest_match = max(matches)
        return longest_match.count(search_str)
    

    This could be written as one line, but it becomes less readable in that form.

    Alternative:

    If polymer_str is huge, it will be more memory efficient to use re.finditer. Here's how you might go about it:

    def getLongestSequenceSize(search_str, polymer_str):
        longest_match = ''
        for match in re.finditer(r'(?:\b%s\b\s?)+' % search_str, polymer_str):
            if len(match.group(0)) > len(longest_match):
                longest_match = match.group(0)
        return longest_match.count(search_str)
    

    The biggest difference between findall and finditer is that the first returns a list object, while the second iterates over Match objects. Also, the finditer approach will be somewhat slower.

提交回复
热议问题