Regex that match 3 consecutive words that start and end with the same letter

只愿长相守 提交于 2021-01-28 08:50:13

问题


I have to match 3 consecutive words that start and end with same letters

I have a code like this:

import re

def regex(file):

    with open(file) as f:

         s=f.read()   

    rx=re.compile(r"([a-z])+\s+\1",re.I)

    r=re.findall(rx,s)

    print(r)   

    return len(r)

The text from the file is something like this

dcvs xa Allo ozo zn bnro ce erdda anfgato e csdfa

and i'm expecting this result:

dcvs xa Allo ozo zn bnro ce erdda anfgato e csdfa

[('a','o'),('e','a')]
2

but i'm getting this:

['a', 'o', 'e', 'a']
4

Any clue?


回答1:


You may use

re.compile(r"[^a-z][a-z]*([a-z])[^a-z]+\1[a-z]*([a-z])[^a-z]‌​+\2[a-z]*[^a-z]",re‌​.I)

See the regex demo.

Note that re.findall will return a list of tuples (of the captured values) in this case, since there are 2 capturing groups, and re.findall only returns captures if capturing groups are defined in a regex pattern.

Details

  • [^a-z] - any char but an ASCII letter
  • [a-z]* - 0+ ASCII letters
  • ([a-z]) - Group 1: any ASCII letter
  • [^a-z]+ - any 1+ chars other than ASCII letters
  • \1 - backreference to the Group 1 contents, same text as captured in Group 1
  • [a-z]* - 0+ ASCII letters
  • ([a-z]) - Group 2: an ASCII letter
  • [^a-z]‌​+ - any 1+ chars other than ASCII letters
  • \2 - backreference to the Group 1 contents, same text as captured in Group 2
  • [a-z]* - 0+ ASCII letters
  • [^a-z] - any 1 char other than an ASCII letter

Python demo:

import re
def regex(s):
    rx=re.compile(r"[^a-z][a-z]*([a-z])[^a-z]+\1[a-z]*([a-z])[^a-z]+\2[a-z]*[^a-z]",re.I)
    d=rx.findall(s)
    print(d)   
    return len(d)
print(regex('dcvs xa Allo ozo zn bnro ce erdda anfgato e csdfa'))

Output:

[('a', 'o'), ('e', 'a')]
2



回答2:


Use this pattern..

r"([a-z])\s\1\w*([a-z])\s\2"

In the pattern you are using, you are just searching for only 2 words that end and start with the same letter.
To search 3 words, you have to tell that the 3rd and 4th letters are from the same word. This is the simplest way that came into my mind. But this will not be the optimal way..

ps: the answer was edited as suggested in the comment..



来源:https://stackoverflow.com/questions/47730939/regex-that-match-3-consecutive-words-that-start-and-end-with-the-same-letter

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!