I have a regular expression to get the initials of a name like below:
/\\b\\p{L}\\./gu
it works fine with English and other languages until there ar
You need to match diacritic marks after base letters using \p{M}*
:
'~\b(?<!\p{M})\p{L}\p{M}*\.~u'
The pattern matches
\b
- a word boundary(?<!\p{M})
- the char before the current position must not be a diacritic char (without it, a match can occur within a single word)\p{L}
- any base Unicode letter\p{M}*
- 0+ diacritic marks\.
- a dot.See the PHP demo online:
$s = "क. ಕ. के. ಕೆ. ";
echo preg_replace('~\b(?<!\p{M})\p{L}\p{M}*+\.~u', '<pre>$0</pre>', $s);
// => <pre>क.</pre> <pre>ಕ.</pre> <pre>के.</pre> <pre>ಕೆ.</pre>