Regex: what is InCombiningDiacriticalMarks?

后端 未结 2 631
逝去的感伤
逝去的感伤 2020-11-30 21:26

The following code is very well known to convert accented chars into plain Text:

Normalizer.normalize(text, Normalizer.Form.NFD).replaceAll(\"\\\\p{InCombini         


        
2条回答
  •  情书的邮戳
    2020-11-30 21:40

    Took me a while, but I fished them all out:

    Here's regex that should include all the zalgo chars including ones bypassed in 'normal' range.

    ([\u0300–\u036F\u1AB0–\u1AFF\u1DC0–\u1DFF\u20D0–\u20FF\uFE20–\uFE2F\u0483-\u0486\u05C7\u0610-\u061A\u0656-\u065F\u0670\u06D6-\u06ED\u0711\u0730-\u073F\u0743-\u074A\u0F18-\u0F19\u0F35\u0F37\u0F72-\u0F73\u0F7A-\u0F81\u0F84\u0e00-\u0eff\uFC5E-\uFC62])
    

    Hope this saves you some time.

提交回复
热议问题