String interning?

我的梦境 提交于 2019-11-29 02:23:24
Scott Dorman

The string in s4 is interned. However, when you execute s4 += "m";, you have created a new string that will not be interned as its value is not a string literal but the result of a string concatenation operation. As a result, s3 and s4 are two different string instances in two different memory locations.

For more information on string interning, look here, specifically at the last example. When you do String.Intern(s4), you are indeed interning the string, but you are still not performing a reference equality test between those two interned strings. The String.Intern method returns the interned string, so you would need to do this:

string s1 = "tom";
string s2 = "tom";

Console.Write(object.ReferenceEquals(s2, s1)); //true 

string s3 = "tom";
string s4 = "to";
s4 += "m";

Console.Write(object.ReferenceEquals(s3, s4)); //false

string s5 = String.Intern(s4);

Console.Write(object.ReferenceEquals(s3, s5)); //true
Jim Schubert

Strings are immutable. This means their contents can't be changed.

When you do s4 += "m"; internally, the CLR copies the string to another location in memory which contains the original string and the appended part.

See MSDN string reference.

Source: https://blogs.msdn.microsoft.com/ericlippert/2009/09/28/string-interning-and-string-empty/

String interning is an optimization technique by the compiler. If you have two identical string literals in one compilation unit then the code generated ensures that there is only one string object created for all the instance of that literal(characters enclosed in double quotes) within the assembly.

I am from C# background, so i can explain by giving a example from that:

object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;

output of the following comparisons:

Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true    
Console.WriteLine(obj == str2); // false !?

Note1:Objects are compared by reference.

Note2:typeof(int).Name is evaluated by reflection method so it does not gets evaluated at compile time. Here these comparisons are made at compile time.

Analysis of the Results: 1) true because they both contain same literal and so the code generated will have only one object referencing "Int32". See Note 1.

2) true because the content of both the value is checked which is same.

3) FALSE because str2 and obj does not have the same literal. See Note 2.

Oleg

First of all, everything written so far about immutable strings is correct. But there are some important things which are not written. The code

string s1 = "tom";
string s2 = "tom";
Console.Write(object.ReferenceEquals(s2, s1)); //true

display really "True", but only because of some small compiler optimization or like here because CLR ignore C# compiler attributes (see "CLR via C#" book) and place only one string "tom" in the heap.

Second you can fix the situation with following lines:

s3 = String.Intern(s3);
s4 = String.Intern(s4);
Console.Write (object.ReferenceEquals (s3, s4)); //true

Function String.Intern calculates a hash code of the string and search for the same hash in the internal hash table. Because it find this, it returns back the reference to already existing String object. If the string doesn't exist in the internal hash table, a copy of the string is made and the hash computed. The garbage collector doesn't free memory for the string, because it is referenced by the hash table.

In C#, each string is a distinct object, and cannot be edited. You are creating references to them, but each string is distinct. The behaviour is consistent and easy to understand.

Might I suggest examining the StringBuilder class for manipulating strings without creating new instances? It should be sufficient for anything you want to do with strings.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!