Boxing and Unboxing in String.Format(…) … is the following rationalized?

问题

I was doing some reading regarding boxing/unboxing, and it turns out that if you do an ordinary String.Format() where you have a value type in your list of object[] arguments, it will cause a boxing operation. For instance, if you're trying to print out the value of an integer and do string.Format("My value is {0}",myVal), it will stick your myVal int in a box and run the ToString function on it.

Browsing around, I found this article.

It appears you can avoid the boxing penalty simply by doing the .ToString on the value type before handing it on to the string.Format function: string.Format("My value is {0}",myVal.ToString())

Is this really true? I'm inclined to believe the author's evidence.
If this is true, why doesn't the compiler simply do this for you? Maybe it's changed since 2006? Does anybody know? (I don't have the time/experience to do the whole IL analysis)

回答1:

The compiler doesn't do this for you because string.Format takes a params Object[]. The boxing happens because of the conversion to Object.

I don't think the compiler tends to special case methods, so it won't remove boxing in cases like this.

Yes in many cases it is true that the compiler won't do boxing if you call ToString() first. If it uses the implementation from Object I think it would still have to box.

Ultimately the string.Format parsing of the format string itself is going to be much slower than any boxing operation, so the overhead is negligible.

回答2:

1: yes, as long as the value-type overrides ToString(), which all the inbuilt types do.

2: because no such behaviour is defined in the spec, and the correct handling of a params object[] (wrt value-types) is: boxing

string.Format is just like any other opaque method; the fact that it is going to do that is opaque to the compiler. It would also be functionally incorrect if the pattern included a format like {0:n2} (which requires a specific transformation, not just ToString()). Trying to understand the pattern is undesirable and unreliable since the pattern may not be known until runtime.

回答3:

It would be better to avoid the boxing by constructing the string with StringBuilder or StringWriter and using the typed overloads.

Most of the time the boxing should be of little concern and not worth you even being aware of it.

回答4:

The easy one first. The reason that the compiler doesn't turn string.Format("{0}", myVal) into string.Format{"{0}", myVal.ToString()), is that there's no reason why it should. Should it turn BlahFooBlahBlah(myVal) into BlahFooBlahBlah(myVal.ToString())? Maybe that'll have the same effect but for better performance, but chances are it'll introduce a bug. Bad compiler! No biscuit!

Unless something can be reasoned about from general principles, the compiler should leave alone.

Now for the interesting bit IMO: Why does the former cause boxing and the latter not.

For the former, since the only matching signature is string.Format(string, object) the integer has to be turned into an object (boxed) to be passed to the method, which expects to receive a string and an object.

The other half of this though, is why doesn't myVal.ToString() box too?

When the compiler comes to this bit of code it has the following knowledge:

myVal is an Int32.
ToString() is defined by Int32
Int32 is a value-type and therefore:
myVal cannot possibly be a null reference* and:
There cannot possibly be a more derived override - Int32.ToString() is effectively sealed.

Now, generally the C# compiler uses callvirt for all method calls for two reasons. The first is that sometimes you do want it to be a virtual call after all. The second is that (more controversially) they decided to ban any method call on a null reference, and callvirt has a built-in test for that.

In this case though, neither of those apply. There can't be a more derived class that overrides Int32.ToString(), and myVal cannot be null. It can therefore introduce a call to the ToString() method that passes the Int32 without boxing.

This combination (value can't be null, method can't be overriden elsewhere) only comes up with reference types much less often, so the compiler can't take as much advantage of it then (it also wouldn't cost as much, since they wouldn't have to be boxed).

This isn't the case if Int32 inherits a method implementaiton. For instance myVal.GetType() would box myVal as there is no Int32 override - there can't be, it's not virtual - so it can only be accessed by treating myVal as an object, by boxing it.

The fact that this means that the C# compiler will use callvirt for non-virtual methods and sometimes call for virtual methods, is not without a degree of irony.

*Note that even a nullable integer set to null is not the same as a null reference in this regard.

回答5:

Why not try each approach a hundred million times or so and see how long it takes:

static void Main(string[] args)
{
    Stopwatch sw = new Stopwatch();

    int myVal = 6;

    sw.Start();

    for (int i = 0; i < 100000000; i++)
    {
        string string1 = string.Format("My value is {0}", myVal);
    }

    sw.Stop();

    Console.WriteLine("Original method - {0} milliseconds", sw.ElapsedMilliseconds);

    sw.Reset();

    sw.Start();

    for (int i = 0; i < 100000000; i++)
    {
        string string2 = string.Format("My value is {0}", myVal.ToString());
    }

    sw.Stop();

    Console.WriteLine("ToStringed method - {0} milliseconds", sw.ElapsedMilliseconds);

    Console.ReadLine();
}

On my machine I'm finding that the .ToStringed version is running in about 95% of the time that the original version takes, so some empirical evidence for a slight performance benefit.

回答6:

string.Format("My value is {0}", myVal)<br>
myVal is an object<br><br>

string.Format("My value is {0}",myVal.ToString())<br>
myVal.ToString() is a string<br><br>

ToString is overloaded and therefore the compiler cannot decide for you.

回答7:

I've found a StringFormatter project on GitHub. Description sounds very promising.

The built-in string formatting facilities in .NET are robust and quite usable. Unfortunately, they also perform a ridiculous number of GC allocations. Mostly these are short lived, and on the desktop GC they generally aren't noticeable. On more constrained systems however, they can be painful. Additionally, if you're trying to track your GC usage via live reporting in your program, you might quickly notice that attempts to print out the current GC state cause additional allocations, defeating the entire attempt at instrumentation.

Thus the existence of this library. It's not completely allocation free; there are several one-time setup costs. The steady state though is entirely allocation-free. You can freely use the string formatting utilities in the main loop of a game without it causing a steady churn of garbage.

I've quickly checked the interface of library. Instead of params pack, author uses functions with manually defined generic arguments. Which completely makes sense for me, if you are taking care of garbage.

来源：https://stackoverflow.com/questions/8477322/boxing-and-unboxing-in-string-format-is-the-following-rationalized

标签

.net

boxing