Could string comparisons really differ based on culture when the string is guaranteed not to change?

后端 未结 3 2110
醉酒成梦
醉酒成梦 2020-12-13 17:02

I\'m reading encrypted credentials/connection strings from a config file. Resharper tells me, \"String.IndexOf(string) is culture-specific here\" on this line:



        
3条回答
  •  长情又很酷
    2020-12-13 17:48

    Absolutely. Per MSDN (http://msdn.microsoft.com/en-us/library/d93tkzah.aspx),

    This method performs a word (case-sensitive and culture-sensitive) search using the current culture.

    So you may get different results if you run it under a different culture (via regional and language settings in Control Panel).

    In this particular case, you probably won't have a problem, but throw an i in the search string and run it in Turkey and it will probably ruin your day.

    See MSDN: http://msdn.microsoft.com/en-us/library/ms973919.aspx

    These new recommendations and APIs exist to alleviate misguided assumptions about the behavior of default string APIs. The canonical example of bugs emerging where non-linguistic string data is interpreted linguistically is the "Turkish-I" problem.

    For nearly all Latin alphabets, including U.S. English, the character i (\u0069) is the lowercase version of the character I (\u0049). This casing rule quickly becomes the default for someone programming in such a culture. However, in Turkish ("tr-TR"), there exists a capital "i with a dot," character (\u0130), which is the capital version of i. Similarly, in Turkish, there is a lowercase "i without a dot," or (\u0131), which capitalizes to I. This behavior occurs in the Azeri culture ("az") as well.

    Therefore, assumptions normally made about capitalizing i or lowercasing I are not valid among all cultures. If the default overloads for string comparison routines are used, they will be subject to variance between cultures. For non-linguistic data, as in the following example, this can produce undesired results:

        Thread.CurrentThread.CurrentCulture = new CultureInfo("en-US")
    Console.WriteLine("Culture = {0}",
       Thread.CurrentThread.CurrentCulture.DisplayName);
    Console.WriteLine("(file == FILE) = {0}", 
       (String.Compare("file", "FILE", true) == 0));
    
    Thread.CurrentThread.CurrentCulture = new CultureInfo("tr-TR");
    Console.WriteLine("Culture = {0}",
       Thread.CurrentThread.CurrentCulture.DisplayName);
    Console.WriteLine("(file == FILE) = {0}", 
       (String.Compare("file", "FILE", true) == 0));
    

    Because of the difference of the comparison of I, results of the comparisons change when the thread culture is changed. This is the output:

    Culture = English (United States)
    (file == FILE) = True
    Culture = Turkish (Turkey)
    (file == FILE) = False
    

    Here is an example without case:

    var s1 = "é"; //é as one character (ALT+0233)
    var s2 = "é"; //'e', plus combining acute accent U+301 (two characters)
    
    Console.WriteLine(s1.IndexOf(s2, StringComparison.Ordinal)); //-1
    Console.WriteLine(s1.IndexOf(s2, StringComparison.InvariantCulture)); //0
    Console.WriteLine(s1.IndexOf(s2, StringComparison.CurrentCulture)); //0
    

提交回复
热议问题