Detecting CJK characters in a string (C#)

最后都变了- 提交于 2019-12-03 14:12:41

use iTextSharp.text.pdf.FontSelector;

iTextSharp.text.pdf.FontSelector selector = new iTextSharp.text.pdf.FontSelector();

// add 2 type of font to FontSelector
selector.AddFont(openSansfont);
selector.AddFont(chinesefont);


iTextSharp.text.Phrase phrase = selector.Process(yourTxt);

FontSelector will use the correct font for you!

Detailed Description from source file FontSelector.cs.

Selects the appropriate fonts that contain the glyphs needed to render text correctly. The fonts are checked in order until the character is found.

I forgot which order it search first!! please experience it!! Edit: the order is from the first addFont to the last addFont.

http://itextpdf.com/examples/iia.php?id=214

Just incase anyone stumbles across this question, I've found another solution using the unicode blocks listed here (http://msdn.microsoft.com/en-us/library/20bw873z.aspx#SupportedNamedBlocks) in a regex.

var Name = "Joe Bloggs";
var Regex = new Regex(@"\p{IsCJKUnifiedIdeographs}");

if(Regex.IsMatch(Name))
{
    //switch to CJK font
}
else
{
    //keep calm and carry on
}

EDIT:

You'll probably need to match more than just the Unified Ideographs, try using this as the regex:

string r = 
@"\p{IsHangulJamo}|"+
@"\p{IsCJKRadicalsSupplement}|"+
@"\p{IsCJKSymbolsandPunctuation}|"+
@"\p{IsEnclosedCJKLettersandMonths}|"+
@"\p{IsCJKCompatibility}|"+
@"\p{IsCJKUnifiedIdeographsExtensionA}|"+
@"\p{IsCJKUnifiedIdeographs}|"+
@"\p{IsHangulSyllables}|"+
@"\p{IsCJKCompatibilityForms}"; 

That works for all the Korean text I tried it on.

Well I did edit daves answer to make it work, but apparently only i can see that until its peer reviewed so i will post the solution as my own answer. Basically dave just needs to extend his regex a bit to this:

string regex = 
@"\p{IsHangulJamo}|"+
@"\p{IsCJKRadicalsSupplement}|"+
@"\p{IsCJKSymbolsandPunctuation}|"+
@"\p{IsEnclosedCJKLettersandMonths}|"+
@"\p{IsCJKCompatibility}|"+
@"\p{IsCJKUnifiedIdeographsExtensionA}|"+
@"\p{IsCJKUnifiedIdeographs}|"+
@"\p{IsHangulSyllables}|"+
@"\p{IsCJKCompatibilityForms}"; 

which will detect Korean characters when used like this:

string subject = "도형이";

Match match = Regex.Match(subject, regex);

if(match.Success)
{
    //change to Korean font
}
else
{
    //keep calm and carry on
{
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!