c# string interning

我的梦境 提交于 2019-12-18 04:45:10

问题


I am trying to understand string interning and why is doesn't seem to work in my example. The point of the example is to show Example 1 uses less (a lot less memory) as it should only have 10 strings in memory. However, in the code below both example use roughly the same amount of memory (virtual size and working set).

Please advice why example 1 isn't using a lot less memory? Thanks

Example 1:

        IList<string> list = new List<string>(10000);

        for (int i = 0; i < 10000; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(string.Intern(k.ToString()));
            }

        }

        Console.WriteLine("intern Done");
        Console.ReadLine();

Example 2:

        IList<string> list = new List<string>(10000);

        for (int i = 0; i < 10000; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(k.ToString());
            }

        }

        Console.WriteLine("intern Done");
        Console.ReadLine();

回答1:


From the msdn Second, to intern a string, you must first create the string. The memory used by the String object must still be allocated, even though the memory will eventually be garbage collected.




回答2:


The problem is that ToString() will still allocate a new string, and then intern it. If the garbage collector doesn't run to collect those "temporary" strings, then the memory usage will be the same.

Also, the length of your strings are pretty short. 10,000 strings that are mostly only one character long is a memory difference of about 20KB which you're probably not going to notice. Try using longer strings (or a lot more of them) and doing a garbage collect before you check the memory usage.

Here is an example that does show a difference:

class Program
{
    static void Main(string[] args)
    {
        int n = 100000;

        if (args[0] == "1")
            WithIntern(n);
        else
            WithoutIntern(n);
    }

    static void WithIntern(int n)
    {
        var list = new List<string>(n);

        for (int i = 0; i < n; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(string.Intern(new string('x', k * 1000)));
            }
        }

        GC.Collect();
        Console.WriteLine("Done.");
        Console.ReadLine();
    }

    static void WithoutIntern(int n)
    {
        var list = new List<string>(n);

        for (int i = 0; i < n; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(new string('x', k * 1000));
            }
        }

        GC.Collect();
        Console.WriteLine("Done.");
        Console.ReadLine();
    }
}



回答3:


Remember, the CLR manages memory on behalf of your process, so it is really hard to figure out the managed memory footprint from looking at virtual size and working set. The CLR will generally allocate and free memory in chunks. The size of these varies according to implementation details, but due to this it is next to impossible to measure managed heap usage based on memory counters for the process.

However, if you look at the actual memory usage for the examples you'll see a difference.

Example 1

0:005>!dumpheap -stat
...
00b6911c      137         4500 System.String
0016be60        8       480188      Free
00b684c4       14       649184 System.Object[]
Total 316 objects
0:005> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x01592dcc
generation 1 starts at 0x01592dc0
generation 2 starts at 0x01591000
ephemeral segment allocation context: none
 segment    begin allocated     size
01590000 01591000  01594dd8 0x00003dd8(15832)
Large object heap starts at 0x02591000
 segment    begin allocated     size
02590000 02591000  026a49a0 0x001139a0(1128864)
Total Size  0x117778(1144696)
------------------------------
GC Heap Size  0x117778(1144696)

Example 2

0:006> !dumpheap -stat
...
00b684c4       14       649184 System.Object[]
00b6911c   100137      2004500 System.String
Total 100350 objects
0:006> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x0179967c
generation 1 starts at 0x01791038
generation 2 starts at 0x01591000
ephemeral segment allocation context: none
 segment    begin allocated     size
01590000 01591000  0179b688 0x0020a688(2139784)
Large object heap starts at 0x02591000
 segment    begin allocated     size
02590000 02591000  026a49a0 0x001139a0(1128864)
Total Size  0x31e028(3268648)
------------------------------
GC Heap Size  0x31e028(3268648)

As you can see from the output above the second example does use more memory on the managed heap.




回答4:


Source: https://blogs.msdn.microsoft.com/ericlippert/2009/09/28/string-interning-and-string-empty/

String interning is an optimization technique by the compiler. If you have two identical string literals in one compilation unit then the code generated ensures that there is only one string object created for all the instance of that literal(characters enclosed in double quotes) within the assembly.

Example:

object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;

output of the following comparisons:

Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true    
Console.WriteLine(obj == str2); // false !?

Note1: Objects are compared by reference.

Note2: typeof(int).Name is evaluated by reflection method so it does not gets evaluated at compile time. Here these comparisons are made at compile time.

Analysis of the Results:

  1. true because they both contain same literal and so the code generated will have only one object referencing "Int32". See Note 1.

  2. true because the content of both the value is checked which is same.

  3. false because str2 and obj does not have the same literal. See Note 2.



来源:https://stackoverflow.com/questions/2506588/c-sharp-string-interning

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!