I\'m trying to find URLs in some text, using javascript code. The problem is, the regular expression I\'m using uses \\w to match letters and digits inside the URL, but it d
The ECMA 262 v3 standard, which defines the programming language commonly known as JavaScript, stipulates that \w
should be equivalent to [a-zA-Z0-9_] and that \d
should be equivalent to [0-9]. \s
on the other hand matches both ASCII and Unicode whitespace, according to the standard.
JavaScript does not support the \p
syntax for matching Unicode things either, so there isn't a good way to do this. You could match all Hebrew characters with:
[\u0590-\u05FF]
This simply matches any code point in the Hebrew block.
You can match any ASCII word character or any Hebrew character with:
[\w\u0590-\u05FF]
Try this \p{L} the unicode regex to Letters
Check this SO Question about JavaScript and Unicode out. Looks like Jan Goyvaerts answer there provides some hope for you.
Edit: But then it seems all browsers don't support \p ... anyway. That question should contain useful info.
Have a look at http://www.regular-expressions.info/refunicode.html.
It looks like there is no \w equivalent for unicode, but you can match single unicode letters, so you can create it.