String interning in .Net Framework - What are the benefits and when to use interning

孤街浪徒 提交于 2019-12-17 05:04:50

问题


I want to know the process and internals of string interning specific to .Net framework. Would also like to know the benefits of using interning and the scenarios/situations where we should use string interning to improve the performance. Though I have studied interning from the Jeffery Richter's CLR book but I am still confused and would like to know it in more detail.

[Editing] to ask a specific question with a sample code as below:

private void MethodA()
{
    string s = "String"; // line 1 - interned literal as explained in the answer        

    //s.intern(); // line 2 - what would happen in line 3 if we uncomment this line, will it make any difference?
}

private bool MethodB(string compareThis)
{
    if (compareThis == "String") // line 3 - will this line use interning (with and without uncommenting line 2 above)?
    {
        return true;
    }
    return false;
}

回答1:


Interning is an internal implementation detail. Unlike boxing, I do not think there is any benefit in knowing more than what you have read in Richter's book.

Micro-optimisation benefits of interning strings manually are minimal hence is generally not recommended.

This probably describes it:

class Program
{
    const string SomeString = "Some String"; // gets interned

    static void Main(string[] args)
    {
        var s1 = SomeString; // use interned string
        var s2 = SomeString; // use interned string
        var s = "String";
        var s3 = "Some " + s; // no interning 

        Console.WriteLine(s1 == s2); // uses interning comparison
        Console.WriteLine(s1 == s3); // do NOT use interning comparison
    }
}



回答2:


In general, interning is something that just happens, automatically, when you use literal string values. Interning provides the benefit of only having one copy of the literal in memory, no matter how often it's used.

That being said, it's rare that there is a reason to intern your own strings that are generated at runtime, or ever even think about string interning for normal development.

There are potentially some benefits if you're going to be doing a lot of work with comparisons of potentially identical runtime generated strings (as interning can speed up comparisons via ReferenceEquals). However, this is a highly specialized usage, and would require a fair amount of profiling and testing, and wouldn't be an optimization I'd consider unless there was a measured problem in place.




回答3:


This is an "old" question, but I have a different angle on it.

If you're going to have a lot of long-lived strings from a small pool, interning can improve memory efficiency.

In my case, I was interning another type of object in a static dictionary because they were reused frequently, and this served as a fast cache before persisting them to disk.

Most of the fields in these objects are strings, and the pool of values is fairly small (much smaller than the number of instances, anyway).

If these were transient objects, it wouldn't matter because the string fields would be garbage collected often. But because references to them were being held, their memory usage started to accumulate (even when no new unique values were being added).

So interning the objects reduced the memory usage substantially, and so did interning their string values while they were being interned.




回答4:


Internalization of strings affects memory consumption.

For example if you read strings and keep them it in a list for caching; and the exact same string occurs 10 times, the string is actually stored only once in memory if string.Intern is used. If not, the string is stored 10 times.

In the example below, the string.Intern variant consumes about 44 MB and the without-version (uncommented) consumes 1195 MB.

static void Main(string[] args)
{
    var list = new List<string>();

    for (int i = 0; i < 5 * 1000 * 1000; i++)
    {
        var s = ReadFromDb();
        list.Add(string.Intern(s));
        //list.Add(s);
    }

    Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64 / 1024 / 1024 + " MB");
}

private static string ReadFromDb()
{
    return "abcdefghijklmnopqrstuvyxz0123456789abcdefghijklmnopqrstuvyxz0123456789abcdefghijklmnopqrstuvyxz0123456789" + 1;
}

Internalization also improves performance for equals-compare. The example below the intern version takes about 1 time units while the non-intern takes 7 time units.

static void Main(string[] args)
{
    var a = string.Intern(ReadFromDb());
    var b = string.Intern(ReadFromDb());
    //var a = ReadFromDb();
    //var b = ReadFromDb();

    int equals = 0;
    var stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < 250 * 1000 * 1000; i++)
    {
        if (a == b) equals++;
    }
    stopwatch.Stop();

    Console.WriteLine(stopwatch.Elapsed + ", equals: " + equals);
}



回答5:


Interned strings have the following characteristics:

  • Two interned strings that are identical will have the same address in memory.
  • Memory occupied by interned strings is not freed until your application terminates.
  • Interning a string involves calculating a hash and looking it up in a dictionary which consumes CPU cycles.
  • If multiple threads intern strings at the same time they will block each other because accesses to the dictionary of interned strings are serialized.

The consequences of these characteristics are:

  • You can test two interned strings for equality by just comparing the address pointer which is a lot faster than comparing each character in the string. This is especially true if the strings are very long and start with the same characters. You can compare interned strings with the Object.ReferenceEquals method, but it is safer to use the string == operator because it checks to see if the strings are internet first.

  • If you use the same string many times in your application, your application will only store one copy of the string in memory reducing the memory required to run your application.

  • If you intern many different strings this will allocate memory for those strings that will never be freed, and your application will consume ever increasing amounts of memory.

  • If you have a very large number of interned strings, string interning can become slow, and threads will block each other when accessing the interned string dictionary.

You should use string interning only if:

  1. The set of strings you are interning is fairly small.
  2. You compare these strings many times for each time that you intern them.
  3. You really care about minute performance optimizations.
  4. You don't have many threads aggressively interning strings.


来源:https://stackoverflow.com/questions/8054471/string-interning-in-net-framework-what-are-the-benefits-and-when-to-use-inter

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!