someString.IndexOf(someString) returns 1 instead of 0 under .NET 4

后端 未结 3 1455
暖寄归人
暖寄归人 2020-12-07 23:55

We have recently upgraded all our projects from .NET 3.5 to .NET 4. I have come across a rather strange issue with respect to string.IndexOf().

My code

3条回答
  •  生来不讨喜
    2020-12-08 00:47

    Your string exists of two characters: a soft hyphen (Unicode code point 173) and a hyphen (Unicode code point 45).

    Wiki: According to the Unicode standard, a soft hyphen is not displayed if the line is not broken at that point.

    When using "\xAD\x2D".IndexOf("\xAD\x2D") in .NET 4, it seems to ignore that you're looking for the soft hyphen, returning a starting index of 1 (the index of \x2D). In .NET 3.5, this returns 0.

    More fun, if you run this code (so when only looking for the soft hyphen):

    string text = "\xAD\x2D";
    string shy = "\xAD";
    int i1 = text.IndexOf(shy);
    

    then i1 becomes 0, regardless of the .NET version used. The result of text.IndexOf(text); varies indeed, which at a glance looks like a bug to me.

    As far as I can track back through the framework, older .NET versions use an InternalCall to IndexOfString() (I can't figure out to which API call that goes), while from .NET 4 a QCall to InternalFindNLSStringEx() is made, which in turn calls FindNLSStringEx().

    The issue (I really can't figure out if this is intended behaviour) indeed occurs when calling FindNLSStringEx:

    LPCWSTR lpStringSource = L"\xAD\x2D";
    LPCWSTR lpStringValue = L"\xAD";
    
    int length;
    
    int i = FindNLSStringEx(
        LOCALE_NAME_SYSTEM_DEFAULT,
        FIND_FROMSTART,
        lpStringSource,
        -1,
        lpStringValue,
        -1,
        &length,
        NULL,
        NULL,
        1);
    
    Console::WriteLine(i);
    
    i = FindNLSStringEx(
        LOCALE_NAME_SYSTEM_DEFAULT,
        FIND_FROMSTART,
        lpStringSource,
        -1,
        lpStringSource,
        -1,
        &length,
        NULL,
        NULL,
        1);
    
    Console::WriteLine(i);
    
    Console::ReadLine();
    

    Prints 0 and then 1. Note that length, an out parameter indicating the length of the found string, is 0 after the first call and 1 after the second; the soft hyphen is counted as having a length of 0.

    The workaround is to use text.IndexOf(text, StringComparison.OrdinalIgnoreCase);, as you've noted. This makes a QCall to InternalCompareStringOrdinalIgnoreCase() which in turn calls FindStringOrdinal(), which returns 0 for both cases.

提交回复
热议问题