Localization: How to map culture info to a script name or Unicode character range?

问题

I need some information about localization. I am using .net 2.0 with C# 2.0 which takes care of most of the localization related issues. However, I need to manually draw the alphabets corresponding to the current culture on the screen in one particular screen.

This would be similar to the Contacts screen in Microsoft Outlook (Address Cards view or Detailed Address Cards View under Contacts), and so it needs a the column of buttons at the right end, one for each alphabet.

I am trying to emulate that, but I don't want to ask the user to choose the script. If the current culture is say, Chinese, I want to draw Chinese alphabets. When the user changes the culture info to English (and when he restarts the application) I want to draw English alphabets instead. Hope you understand where I am going with this query.

I can determine the culture of the current user (Application.CurrentCulture or System.Globalization.CultureInfo.CurrentCulture will give the culture related information). I also have all the scripts to render the alphabets. However, the problem is that I don't know how to map the culture info to the name of a script.

In other words, is there a way to determine the script name corresponding to a culture? Or is it possible to determine the range of Unicode character values corresponding to a culture? Either of them would allow me to render the alphabets on the button properly.

Any suggestions or guidance regarding this is truly appreciated. If there is something fundamentally wrong with my approach (or with what I am trying to achieve), please point out that as well. Thanks for your time.

PS: I know the easiest solution is to either configure the script name as part of user preferences or display a list of languages for the user to choose from (a la Contact in Outlook 2007). But I am just trying to see whether I can render the alphabets corresponding to the culture without the user having to do anything.

回答1:

In native code there's LOCALE_SSCRIPTS for GetLocaleInfoEx() (Vista & above) that shows you what scripts are expected for a locale. There isn't a similar concept for .Net at this time.

回答2:

Chinese has thousands of characters, so it might not be feasible to show all the characters in their character set. There's no native concept of 'alphabet' in Chinese, and I don't think Chinese has a syllabary like Japanese does.

Pinyin (Chinese written in roman alphabet) can be used to represent the Chinese characters, and that might help you index them. I know this doesn't answer your question, but I hope it's helpful.

回答3:

I fully agree with mikiemacman. In addition, a given laguage doesn't necessarily uses all the letters of a script.

Anyway, the closest I can think of is CultureInfo.TextInfo.ANSICodePage -> There are only a handful of ANSI code pages. You could have create a table (or a switch() statement, whatever) that lists the script for each ANSI codepage.

回答4:

Proto, wait! There's a much more accurate solution. It's an unmanaged on hance you may have to P/Invoke.

GetLocaleInfoW(MAKELCID(wLangId, SORT_DEFAULT), LOCALE_FONTSIGNATURE, wcBuf, MAXWCBUF);

This gives you a LOCALESIGNATURE stucture. The anwer is in the lsUsb field: Unicode subsets bitfield. Rats! the MS page for this structure is empty. But look it up in your MSDN copy. It's fully documented there: A whole set of flags that describe which scripts are spported. And yes, there's a flag for Tamil ;-)

HTH.

EDIT: Oops! Hadn't seen Shawne's answer. Wow! Answer from an in-house expert! ;-) Anyway, you may still be interested in a Pre-Vista compatible answer.

回答5:

Fascinating topic. While it might not answer your question, Omniglot is a good resource.

The correct answer is likely to be complex, and depend on the exact problem you're solving. Assuming your goal showing only letters used in a particular language to separate phonebook sections (as in Outlook), few of the issues are:

People who have contact names spanning several scripts/languages.
2-glyph letters (e.g. 'Lj' in Serbian). It is one phoneme, always treated as a single letter although it has 2 Unicode symbols. 'It would have its own section in the phonebook (separate from 'L').
Too many glyphs to list (e.g. Chinese)
Unorthodox ordering (e.g. Thai -- a phone book would be separated by consonants only, ignoring the vowels).
Uppercase / lowercase distinction (presumably you'd only want one case for languages that support it -- which breaks down in minor ways Turkish 'i').

来源：https://stackoverflow.com/questions/252662/localization-how-to-map-culture-info-to-a-script-name-or-unicode-character-rang

标签

localization

.net-2.0

globalization

culture