RegExp match repeated characters

前端未结

关注

 6  1857

For example I have string:

 aacbbbqq

As the result I want to have following matches:

 (aa, c, bbb, qq)

相关标签:

6条回答

日久生厌

2020-11-30 04:27
The findall method will work if you capture the back-reference like so:
```
result = [match[1] + match[0] for match in re.findall(r"(.)(\1*)", string)]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
别那么骄傲

2020-11-30 04:29
itertools.groupby is not a RexExp, but it's not self-written either. :-) A quote from python docs:
```
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-11-30 04:35
Generally

The trick is to match a single char of the range you want, and then make sure you match all repetitions of the same character:
```
>>> matcher= re.compile(r'(.)\1*')
```
This matches any single character (.) and then its repetitions (\1*) if any.

For your input string, you can get the desired output as:
```
>>> [match.group() for match in matcher.finditer('aacbbbqq')]
['aa', 'c', 'bbb', 'qq']
```
NB: because of the match group, re.findall won't work correctly.

Other ranges

In case you don't want to match any character, change accordingly the . in the regular expression:
```
>>> matcher= re.compile(r'([a-z])\1*') # only lower case ASCII letters
>>> matcher= re.compile(r'(?i)([a-z])\1*') # only ASCII letters
>>> matcher= re.compile(r'(\w)\1*') # ASCII letters or digits or underscores
>>> matcher= re.compile(r'(?u)(\w)\1*') # against unicode values, any letter or digit known to Unicode, or underscore
```
Check the latter against u'hello²²' (Python 2.x) or 'hello²²' (Python 3.x):
```
>>> text= u'hello=\xb2\xb2'
>>> print('\n'.join(match.group() for match in matcher.finditer(text)))
h
e
ll
o
²²
```
\w against non-Unicode strings / bytearrays might be modified if you first have issued a locale.setlocale call.
0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-11-30 04:36
This will work, see a working example here: http://www.rubular.com/r/ptdPuz0qDV
```
(\w)\1*
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
说谎

2020-11-30 04:39
You can use:
```
re.sub(r"(\w)\1*", r'\1', 'tessst')
```
The output would be:
```
'test'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
广开言路

2020-11-30 04:45

You can match that with: (\w)\1*

0 讨论(0)
发布评论:

提交评论
- 加载中...

RegExp match repeated characters

Generally

Other ranges