RegEx: \w - “_” + “-” in UTF-8

前端 未结 2 787
星月不相逢
星月不相逢 2020-12-14 11:36

I need a regular expression that matches UTF-8 letters and digits, the dash sign (-) but doesn\'t match underscores (_), I tried these silly attemp

相关标签:
2条回答
  • 2020-12-14 12:03

    I am not sure which language you use, but in PERL you can simply write: [[:alnum:]-]+ when the correct locale is set.

    0 讨论(0)
  • 2020-12-14 12:19

    Try this:

    (?:[\w\-](?<!_))+
    

    It does a simple match on anything that is encoded as a \w (or a dash) and then has a zero-width lookbehind that ensures that the character that was just matched is not a underscore.

    Otherwise you could pick this one:

    (?:[^_\W]|-)+
    

    which is a more set-based approach (note the uppercase W)

    OK, I had a lot of fun with unicode in php's flavor of PCREs :D Peekaboo says there is a simple solution available:

    [\p{L}\p{N}\-]+
    

    \p{L} matches anything unicode that qualifies as a Letter (note: not a word character, thus no underscores), while \p{N} matches anything that looks like a number (including roman numerals and more exotic things).
    \- is just an escaped dash. Although not strictly necessary, I tend to make it a point to escape dashes in character classes... Note, that there are dozens of different dashes in unicode, thus giving rise to the following version:

    [\p{L}\p{N}\p{Pd}]+
    

    Where "Pd" is Punctuation Dash, including, but not limited to our minus-dash-thingy. (Note, again no underscore here).

    0 讨论(0)
提交回复
热议问题