问题
I have to match 3 consecutive words that start and end with same letters
I have a code like this:
import re
def regex(file):
with open(file) as f:
s=f.read()
rx=re.compile(r"([a-z])+\s+\1",re.I)
r=re.findall(rx,s)
print(r)
return len(r)
The text from the file is something like this
dcvs xa Allo ozo zn bnro ce erdda anfgato e csdfa
and i'm expecting this result:
dcvs xa Allo ozo zn bnro ce erdda anfgato e csdfa
[('a','o'),('e','a')]
2
but i'm getting this:
['a', 'o', 'e', 'a']
4
Any clue?
回答1:
You may use
re.compile(r"[^a-z][a-z]*([a-z])[^a-z]+\1[a-z]*([a-z])[^a-z]+\2[a-z]*[^a-z]",re.I)
See the regex demo.
Note that re.findall will return a list of tuples (of the captured values) in this case, since there are 2 capturing groups, and re.findall only returns captures if capturing groups are defined in a regex pattern.
Details
[^a-z]- any char but an ASCII letter[a-z]*- 0+ ASCII letters([a-z])- Group 1: any ASCII letter[^a-z]+- any 1+ chars other than ASCII letters\1- backreference to the Group 1 contents, same text as captured in Group 1[a-z]*- 0+ ASCII letters([a-z])- Group 2: an ASCII letter[^a-z]+- any 1+ chars other than ASCII letters\2- backreference to the Group 1 contents, same text as captured in Group 2[a-z]*- 0+ ASCII letters[^a-z]- any 1 char other than an ASCII letter
Python demo:
import re
def regex(s):
rx=re.compile(r"[^a-z][a-z]*([a-z])[^a-z]+\1[a-z]*([a-z])[^a-z]+\2[a-z]*[^a-z]",re.I)
d=rx.findall(s)
print(d)
return len(d)
print(regex('dcvs xa Allo ozo zn bnro ce erdda anfgato e csdfa'))
Output:
[('a', 'o'), ('e', 'a')]
2
回答2:
Use this pattern..
r"([a-z])\s\1\w*([a-z])\s\2"
In the pattern you are using, you are just searching for only 2 words that end and start with the same letter.
To search 3 words, you have to tell that the 3rd and 4th letters are from the same word. This is the simplest way that came into my mind. But this will not be the optimal way..
ps: the answer was edited as suggested in the comment..
来源:https://stackoverflow.com/questions/47730939/regex-that-match-3-consecutive-words-that-start-and-end-with-the-same-letter