How can I make a regular expression which takes accented characters into account?

狂风中的少年 提交于 2019-11-29 10:40:40

While JavaScript regexes recognize non-ASCII characters in some cases (like \s), it's hopelessly inadequate when it comes to \w and \b. If you want them to work with anything beyond the ASCII word characters, you'll have to either use a different language, or install Steve Levithan's XRegExp library with the Unicode plugin.

By the way, there's an error in your regex. You have a \b after the optional trailing comma, but it should be in front:

"\\b([a-z]{2})\\b,?"

I also removed the square brackets; you would only need those if the comma had a special meaning in regexes, which it doesn't. But I suspect you don't need to match the comma at all; \b should be sufficient to make sure you're at the end of the word. And if you don't need the comma, you don't need the capturing group either:

"\\b[a-z]{2}\\b"
Beel

Have you set JavaScript to use non-ASCII? Here is a page that suggests setting JavaScript to use UTF-8: http://blogs.oracle.com/shankar/entry/how_to_handle_utf_8

It says:

add a charset attribute (charset="utf-8") to your script tags in the parent page:

script type="text/javascript" src="[path]/myscript.js"  charset="utf-8"
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!