C# Regular Expressions with \Uxxxxxxxx characters in the pattern

后端 未结 3 1936
孤城傲影
孤城傲影 2020-12-19 07:21
Regex.IsMatch( \"foo\", \"[\\U00010000-\\U0010FFFF]\" ) 

Throws: System.ArgumentException: parsing \"[-]\" - [x-y] range in reverse order.

3条回答
  •  南笙
    南笙 (楼主)
    2020-12-19 08:03

    To workaround such things with .Net regex engine, I'm using following trick: "[\U010000-\U10FFFF]" is replaced with [\uD800-\uDBFF][\uDC00-\uDFFF] The idea behind this is that as .Net regexes handle code units instead of code points, we're providing it with surrogate ranges as regular characters. It's also possible to specify more narrow ranges by operating with edges, e.g.: [\U011DEF-\U013E07] is same as (?:\uD807[\uDDEF-\uDFFF])|(?:[\uD808-\uD80E][\uDC00-\uDFFF])|(?:\uD80F[\uDC00-uDE07])

    It's harder to read and operate with, and it's not that flexible, but still fits as workaround.

提交回复
热议问题