How to match all unicode alphabetic characters and spaces in a regex?

て烟熏妆下的殇ゞ 提交于 2020-02-25 13:41:06

问题


I am trying to validate place names in python 3/ django forms. I want to get matches with strings like: Los Angeles, Canada, 中国, and Россия. That is, the string contains:

  • spaces
  • alphabetic characters (from any language)
  • no numbers
  • no special characters (punctuation, symbols etc.)

The pattern I am currently using is r'^[^\W\d]+$' as suggested in How to match alphabetical chars without numeric chars with Python regexp?. However it only seems to match like the pattern r'^[a-zA-Z]+$. That is, Россия, Los Angeles and 中国 do not match , only Canada does.

An example of my code:

import re
re.search(r'^[^\W\d]+$', 'Россия')

Which returns nothing.


回答1:


Your example works for me, but will find underscores and not spaces. This works:

>>> re.search(r'^(?:[^\W\d_]| )+$', 'Los Angeles')
<_sre.SRE_Match object at 0x0000000003C612A0>
>>> re.search(r'^(?:[^\W\d_]| )+$', 'Россия')
<_sre.SRE_Match object at 0x0000000003A0D030>
>>> re.search(r'^(?:[^\W\d_]| )+$', 'Los_Angeles') # not found
>>> re.search(r'^(?:[^\W\d_]| )+$', '中国')
<_sre.SRE_Match object at 0x0000000003C612A0>


来源:https://stackoverflow.com/questions/34957154/how-to-match-all-unicode-alphabetic-characters-and-spaces-in-a-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!