Localization: How to map culture info to a script name or Unicode character range?

限于喜欢 提交于 2019-11-29 15:59:43

In native code there's LOCALE_SSCRIPTS for GetLocaleInfoEx() (Vista & above) that shows you what scripts are expected for a locale. There isn't a similar concept for .Net at this time.

Chinese has thousands of characters, so it might not be feasible to show all the characters in their character set. There's no native concept of 'alphabet' in Chinese, and I don't think Chinese has a syllabary like Japanese does.

Pinyin (Chinese written in roman alphabet) can be used to represent the Chinese characters, and that might help you index them. I know this doesn't answer your question, but I hope it's helpful.

I fully agree with mikiemacman. In addition, a given laguage doesn't necessarily uses all the letters of a script.

Anyway, the closest I can think of is CultureInfo.TextInfo.ANSICodePage -> There are only a handful of ANSI code pages. You could have create a table (or a switch() statement, whatever) that lists the script for each ANSI codepage.

Proto, wait! There's a much more accurate solution. It's an unmanaged on hance you may have to P/Invoke.

GetLocaleInfoW(MAKELCID(wLangId, SORT_DEFAULT), LOCALE_FONTSIGNATURE, wcBuf, MAXWCBUF);

This gives you a LOCALESIGNATURE stucture. The anwer is in the lsUsb field: Unicode subsets bitfield. Rats! the MS page for this structure is empty. But look it up in your MSDN copy. It's fully documented there: A whole set of flags that describe which scripts are spported. And yes, there's a flag for Tamil ;-)

HTH.

EDIT: Oops! Hadn't seen Shawne's answer. Wow! Answer from an in-house expert! ;-) Anyway, you may still be interested in a Pre-Vista compatible answer.

Fascinating topic. While it might not answer your question, Omniglot is a good resource.

The correct answer is likely to be complex, and depend on the exact problem you're solving. Assuming your goal showing only letters used in a particular language to separate phonebook sections (as in Outlook), few of the issues are:

  • People who have contact names spanning several scripts/languages.
  • 2-glyph letters (e.g. 'Lj' in Serbian). It is one phoneme, always treated as a single letter although it has 2 Unicode symbols. 'It would have its own section in the phonebook (separate from 'L').
  • Too many glyphs to list (e.g. Chinese)
  • Unorthodox ordering (e.g. Thai -- a phone book would be separated by consonants only, ignoring the vowels).
  • Uppercase / lowercase distinction (presumably you'd only want one case for languages that support it -- which breaks down in minor ways Turkish 'i').
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!