Can regular expressions work with different languages?

后端未结

关注

 8  1486

慢半拍i

English, of course, is a no-brainer for regex because that\'s what it was originally developed in/for:

Can regular expressions understand this charact

相关标签:

8条回答

甜味超标

2020-12-10 05:55

Generally speaking, regex is more for grokking machine-readable text than for human-readable text. It is in many ways a more general answer to the whole XML with regex thing; regex is by its very nature incapable of properly parsing human language, because the language is more complex than what you are using to parse it.

If you want to break down human language (English included), you would want to use a language analysis tool or even an AI, not mere regular expressions.

0 讨论(0)
发布评论:

提交评论
- 加载中...
甜味超标

2020-12-10 06:02

it is not about the regular expression but about framework that executes it. java and .net i think are very good in handling unicode. so "è and e both considered word characters by regex" is true.

0 讨论(0)
发布评论:

提交评论
- 加载中...
無奈伤痛

2020-12-10 06:03

Short answer: yes.

More specifically it depends on your regex engine supporting unicode matches (as described here).

Such matches can complicate your regular expressions enormously, so I can recommend reading this unicode regex tutorial (also note that unicode implementations themselves can be quite a mess so you might also benefit from reading Joel Spolsky's article about the inner workings of character sets).

0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2020-12-10 06:05

This SO thread might help. It includes the Unicode character classes you can use in a regex (e.g., [Ll] is all lowercase letters, regardless of language).

0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-12-10 06:13

It depends on the implementation and the character set. In general the answer is "Yes," but it may require additional setup on your part.

In Perl, for example, the meaning of things like \w is altered by the chosen locale (use locale).

0 讨论(0)
发布评论:

提交评论
- 加载中...
我在风中等你

2020-12-10 06:13

/[\p{Latin}]/ should for example, include Latin alphabet. You can get the full explanation and reference here.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页