Why does \w match only English words in javascript regex?

后端 未结 10 878
借酒劲吻你
借酒劲吻你 2020-12-09 20:26

I\'m trying to find URLs in some text, using javascript code. The problem is, the regular expression I\'m using uses \\w to match letters and digits inside the URL, but it d

相关标签:
10条回答
  • 2020-12-09 20:54

    The ECMA 262 v3 standard, which defines the programming language commonly known as JavaScript, stipulates that \w should be equivalent to [a-zA-Z0-9_] and that \d should be equivalent to [0-9]. \s on the other hand matches both ASCII and Unicode whitespace, according to the standard.

    JavaScript does not support the \p syntax for matching Unicode things either, so there isn't a good way to do this. You could match all Hebrew characters with:

    [\u0590-\u05FF]
    

    This simply matches any code point in the Hebrew block.

    You can match any ASCII word character or any Hebrew character with:

    [\w\u0590-\u05FF]
    
    0 讨论(0)
  • 2020-12-09 20:58

    Try this \p{L} the unicode regex to Letters

    0 讨论(0)
  • 2020-12-09 20:58

    Check this SO Question about JavaScript and Unicode out. Looks like Jan Goyvaerts answer there provides some hope for you.

    Edit: But then it seems all browsers don't support \p ... anyway. That question should contain useful info.

    0 讨论(0)
  • 2020-12-09 21:00

    Have a look at http://www.regular-expressions.info/refunicode.html.

    It looks like there is no \w equivalent for unicode, but you can match single unicode letters, so you can create it.

    0 讨论(0)
提交回复
热议问题