What's a good regex to include accented characters in a simple way?

后端 未结 4 1109
死守一世寂寞
死守一世寂寞 2020-12-14 02:12

Right now my regex is something like this:

[a-zA-Z0-9] but it does not include accented characters like I would want to. I would also like - \' , to be included.

相关标签:
4条回答
  • 2020-12-14 02:54

    Accented Characters: DIY Character Range Subtraction

    If your regex engine allows it (and many will), this will work:

    (?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])+$
    

    Please see the demo (you can add characters to test).

    Explanation

    • (?i) sets case-insensitive mode
    • The ^ anchor asserts that we are at the beginning of the string
    • (?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ]) matches one character...
    • The lookahead (?![×Þß÷þø]) asserts that the char is not one of those in the brackets
    • [-'0-9a-zÀ-ÿ] allows dash, apostrophe, digits, letters, and chars in a wide accented range, from which we need to subtract
    • The + matches that one or more times
    • The $ anchor asserts that we are at the end of the string

    Reference

    Extended ASCII Table

    0 讨论(0)
  • 2020-12-14 02:57

    You just put:

    \p(L}\p{M}
    

    in your expression. This in Unicode will match:

    • any letter character (L) from any language
    • and marks (M)(i.e, a character that is to be combined with another: accent, etc.)
    0 讨论(0)
  • 2020-12-14 03:03

    A version without the exclusion rules:

    ^[-'a-zA-ZÀ-ÖØ-öø-ÿ]+$
    

    Explanation

    • The ^ anchor asserts that we are at the beginning of the string
    • [...] allows dash, apostrophe, digits, letters, and chars in a wide accented range,
    • The + matches that one or more times
    • The $ anchor asserts that we are at the end of the string

    Reference

    • Extended ASCII Table
    0 讨论(0)
  • 2020-12-14 03:03

    Use a POSIX character class (http://www.regular-expressions.info/posixbrackets.html):

    [-'[:alpha:]0-9] or [-'[:alnum:]]

    The [:alpha:] character class matches whatever is considered "alphabetic characters" in your locale.

    0 讨论(0)
提交回复
热议问题