The 32 bytes limit for inline function … not too small?

问题

I have a very small c# code marked as inline, but dont work. I have seen that the longest function generates more than 32 bytes of IL code. Does the limit of 32 bytes too short ?

// inlined
[MethodImpl(MethodImplOptions.AggressiveInlining)] 
static public bool INL_IsInRange (this byte pValue, byte pMin) {
  return(pValue>=pMin);
}

// NOT inlined
[MethodImpl(MethodImplOptions.AggressiveInlining)] 
static public bool INL_IsInRange (this byte pValue, byte pMin, byte pMax) {
  return(pValue>=pMin&&pValue<=pMax);
}

Is it possible to change that limit?

回答1:

I am looking for inline function criteria also. In your case, I believe that JIT optimization timed out before it could reach the decision to inline your second function. For JIT, it's not a priority to inline a function, so it was busy analyzing your long code. However, if you place your calls inside tight loops, JIT will probably inline them, as inner calls gain priority to inline. If you really care about this type of micro-optimization, it's time to switch to C++. It's a whole new brave world out there for you to explore and exploit!

I noticed that the question had been edited right after this answer had been posted, meaning a high level of interactivity. Well, I don't know why there is a limit of 32 bytes, but that seems to be exactly the size of a CPU cache block, conservatively speaking. What a coincidence! In any case, code optimization must be done with a particular hardware configuration, better saved in an extra file side by side with its assembly. The timeout policy is stupid, because optimization is not supposed to be done at run-time, competing against the precious code execution time. Optimization is supposed to be done at application load-time, only the first time it's run on the machine, once for all. It can be triggered again when hardware configuration change is detected. Again, if you really need performance, just go with C/C++. C# is not designed for performance and will never make performance its top priority. Like Java, C# is designed for safety, with a much stronger caution against possible negative performance impacts.

回答2:

Up to the "32-bytes of IL" limit, there are a number of other factors which affect whether a method would be inlined or not. There are at least a couple of articles that describe these factors.

One article explains that a scoring heuristic is used to adjust an initial guess about the relative size of the code when inlined vs not (i.e. whether the call site is larger or smaller than the inlined code itself):

If inlining makes code smaller then the call it replaces, it is ALWAYS good. Note that we are talking about the NATIVE code size, not the IL code size (which can be quite different).

The more a particular call site is executed, the more it will benefit from inlning. Thus code in loops deserves to be inlined more than code that is not in loops.

If inlining exposes important optimizations, then inlining is more desirable. In particular methods with value types arguments benefit more than normal because of optimizations like this and thus having a bias to inline these methods is good.

Thus the heuristic the X86 JIT compiler uses is, given an inline candidate.

Estimate the size of the call site if the method were not inlined.

Estimate the size of the call site if it were inlined (this is an estimate based on the IL, we employ a simple state machine (Markov Model), created using lots of real data to form this estimator logic)

Compute a multiplier. By default it is 1

Increase the multiplier if the code is in a loop (the current heuristic bumps it to 5 in a loop)

Increase the multiplier if it looks like struct optimizations will kick in.

If InlineSize <= NonInlineSize * Multiplier do the inlining.

Another article explains several conditions that will prevent a method from being inlined based on their mere existence (including the "32-bytes of IL" limit):

These are some of the reasons for which we won't inline a method:

Method is marked as not inline with the CompilerServices.MethodImpl attribute.

Size of inlinee is limited to 32 bytes of IL: This is a heuristic, the rationale behind it is that usually, when you have methods bigger than that, the overhead of the call will not be as significative compared to the work the method does. Of course, as a heuristic, it fails in some situations. There have been suggestions for us adding an attribute to control these threshold. For Whidbey, that attribute has not been added (it has some very bad properties: it's x86 JIT specific and it's longterm value, as compilers get smarter, is dubious).

Virtual calls: We don't inline across virtual calls. The reason for not doing this is that we don't know the final target of the call. We could potentially do better here (for example, if 99% of calls end up in the same target, you can generate code that does a check on the method table of the object the virtual call is going to execute on, if it's not the 99% case, you do a call, else you just execute the inlined code), but unlike the J language, most of the calls in the primary languages we support, are not virtual, so we're not forced to be so aggressive about optimizing this case.

Valuetypes: We have several limitations regarding value types an inlining. We take the blame here, this is a limitation of our JIT, we could do better and we know it. Unfortunately, when stack ranked against other features of Whidbey, getting some statistics on how frequently methods cannot be inlined due to this reason and considering the cost of making this area of the JIT significantly better, we decided that it made more sense for our customers to spend our time working in other optimizations or CLR features. Whidbey is better than previous versions in one case: value types that only have a pointer size int as a member, this was (relatively) not expensive to make better, and helped a lot in common value types such as pointer wrappers (IntPtr, etc).

MarshalByRef: Call targets that are in MarshalByRef classes won't be inlined (call has to be intercepted and dispatched). We've got better in Whidbey for this scenario

VM restrictions: These are mostly security, the JIT must ask the VM for permission to inline a method (see CEEInfo::canInline in Rotor source to get an idea of what kind of things the VM checks for).

Complicated flowgraph: We don't inline loops, methods with exception handling regions, etc...

If basic block that has the call is deemed as it won't execute frequently (for example, a basic block that has a throw, or a static class constructor), inlining is much less aggressive (as the only real win we can make is code size)

Other: Exotic IL instructions, security checks that need a method frame, etc...

来源：https://stackoverflow.com/questions/41091926/the-32-bytes-limit-for-inline-function-not-too-small

标签

inline