Matching an apostrophe only within a word or string

依然范特西╮ 提交于 2021-02-05 12:28:37

问题


I'm looking for a Python regex that can match 'didn't' and returns only the character that is immediately preceded by an apostrophe, like 't, but not the 'd or t' at the beginning and end.

I have tried (?=.*\w)^(\w|')+$ but it only matches the apostrophe at the beginning.

Some more examples:

'I'm' should only match 'm and not 'I

'Erick's' should only return 's and not 'E

The text will always start and end with an apostrophe and can include apostrophes within the text.


回答1:


To match an apostrophe inside a whole string = match it anwyhere but at the start/end of the string:

(?!^)'(?!$)

See the regex demo.

Often, the apostophe is searched only inside a word (but in fact, a pair of words where the second one is shortened), then you may use

\b'\b

See this regex demo. Here, the ' is preceded and followed with a word boundary, so that ' could be preceded with any word, letter or _ char. Yes, _ char and digits are allowed to be on both sides.

If you need to match a ' only between two letters, use

(?<=[A-Za-z])'(?=[A-Za-z])    # ASCII only
(?<=[^\W\d_])'(?=[^\W\d_])    # Any Unicode letters

See this regex demo.

As for this current question, here is a bunch of possible solutions:

import re

s = "'didn't'"
print(s.strip("'")[s.strip("'").find("'")+1])
print(re.search(r'\b\'(\w)', s).group(1))
print(re.search(r'\b\'([^\W\d_])', s).group(1))
print(re.search(r'\b\'([a-z])', s, flags=re.I).group(1))
print(re.findall(r'\b\'([a-z])', "'didn't know I'm a student'", flags=re.I))

The s.strip("'")[s.strip("'").find("'")+1] gets the character after the first ' after stripping the leading/trailing apostrophes.

The re.search(r'\b\'(\w)', s).group(1) solution gets the word (i.e. [a-zA-Z0-9_], can be adjusted from here) char after a ' that is preceded with a word char (due to the \b word boundary).

The re.search(r'\b\'([^\W\d_])', s).group(1) is almost identical to the above solution, it only fetches a letter character as [^\W\d_] matches any char other than a non-word, digit and _.

Note that the re.search(r'\b\'([a-z])', s, flags=re.I).group(1) solution is next to identical to the above one, but you cannot make it Unicode aware with re.UNICODE.

The last re.findall(r'\b\'([a-z])', "'didn't know I'm a student'", flags=re.I) just shows how to fetch multiple letter chars from a string input.



来源:https://stackoverflow.com/questions/38758873/matching-an-apostrophe-only-within-a-word-or-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!