Python Regex for hyphenated words

試著忘記壹切 提交于 2019-12-06 20:22:24

问题


I'm looking for a regex to match hyphenated words in python.

The closest I've managed to get is: '\w+-\w+[-w+]*'

text = "one-hundered-and-three- some text foo-bar some--text"
hyphenated = re.findall(r'\w+-\w+[-\w+]*',text)

which returns list ['one-hundered-and-three-', 'foo-bar'].

This is almost perfect except for the trailing hyphen after 'three'. I only want the additional hyphen if followed by a 'word'. i.e. instead of the '[-\w+]*' I need something like '(-\w+)*' which I thought would work, but doesn't (it returns ['-three, '']). i.e. something that matches |word followed by hyphen followed by word followed by hyphen_word zero or more times|.


回答1:


Try this:

re.findall(r'\w+(?:-\w+)+',text)

Here we consider a hyphenated word to be:

  • a number of word chars
  • followed by any number of:
    • a single hyphen
    • followed by word chars


来源:https://stackoverflow.com/questions/8383213/python-regex-for-hyphenated-words

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!