.NET CIL manipulation of evaluation stack

£可爱£侵袭症+ 提交于 2019-12-05 10:51:37

You IL-Code looks ok, but i think the CLR might not be able to check if the stack is corrupted after the method completes. When something is pushed onto the stack, the CLR checks if the value are also popped from the stack.

So if you push 3 values onto the stack the CLR not might be able to check if your loop is running three times, so the CLR doesn't know if there are still values onto the stack when the method is returning.

Very interesting question. You are attempting to use the IL execution stack for the purpose of storing a queue of arbitrary data items. This introduces an unusual condition, versus conventional IL code, where the correct balance of the stack critically depends on the number runtime loop iterations exactly matching the number of ILAsm-time data items that were burned-into the IL. As you noted, the program (repeated here) does not work.

(in fact, in my build which uses link.exe with /LTCG, the linker fails to even produce the assembly, giving "fatal error C1352: Invalid or corrupt MSIL in function.")

.method public static void ExecStackResidual()      // !!! FAILS - BAD EXAMPLE - NO !!!
{
    .locals init (int32 i, int32 cur)

    ldc.i4.6        // enqueue item  -- NO!
    ldc.i4.5        // enqueue item  -- NO!
    ldc.i4.4        // enqueue item  -- NO!

    ldc.i4.0
    stloc i
    br _next

_more:
    stloc cur       // de-queue item  -- NO!

    ldloc cur
    box int32
    call void Debug::WriteLine(object)

    ldloc i
    ldc.i4.1
    add
    stloc i

_next:
    ldloc i
    ldc.i4.3
    blt _more
    ret 
}

The problem is n̲o̲t̲ due a simple logical flaw or off-by-one bug in your code. This can be shown by the fact that commenting out the controversial parts as follows works fine, printing 3 zeros.

    //-- ldc.i4.6       // commented out
    //-- ldc.i4.5       // commented out
    //-- ldc.i4.4       // commented out

    ldc.i4.0
    stloc i
    br _next

_more:
    //-- stloc cur      // commented out

    ldloc cur
    box int32
    call void Debug::WriteLine(object)

    ldloc i
    ldc.i4.1
    add
    stloc i

_next:
    ldloc i
    ldc.i4.3
    blt _more
    ret 

The OP did some sleuthing and discovered ECMA-335, III.1.7.5, which seems like it could be relevant here, since the main difference between the working and failing example is that the latter existentially requires having a non-empty evaluation stack (a.k.a. "sequence point") at location _more, and that location does indeed, to quote the spec...

"...immediately follow an unconditional branch [here, br _next], and where [_more] is not the target of an earlier branch instruction."

Unfortunately however, this doesn't seem to be quite the full explanation, because as long as you remove the queued items in an balanced way that can be discerned statically, the evaluation stack apparently does not have to be empty at location _more. This is demonstrated by the following code, which also works fine, printing the 3 zeros, despite having several items on the execution stack at the ECMA-335, III.1.7.5-vulnerable location _more.

    ldc.i4.6       // enqueue item  -- ok
    ldc.i4.5       // enqueue item  -- ok
    ldc.i4.4       // enqueue item  -- ok

    ldc.i4.0
    stloc i
    br _next

_more:
    //-- stloc cur      // de-queue item  -- still commented out

    ldloc cur
    box int32
    call void Debug::WriteLine(object)

    ldloc i
    ldc.i4.1
    add
    stloc i

_next:
    ldloc i
    ldc.i4.3
    blt _more

    pop         // de-queue item  -- required
    pop         // de-queue item  -- required
    pop         // de-queue item  -- required
    ret 

The OP also uses the terminology "backward branch constraint," but it's unclear whether the phrase was found in a spec or perhaps an original contribution. It seems possibly entailed by the appearance of the phrase "...earlier branch instruction" in the spec. Either way it raises a tantalizing question of whether the error can be avoided by rearranging the code so that there are no locations that (technically) match the "earlier" (technicality) in the ECMA-335, III.1.7.5 constraint.

A related idea is that "unconditional branch" in the spec means only br family instructions. To get around br, we can instead embed a ret instruction within the method body as shown next. As you likely guessed, this is to no avail. Although the spec doesn't explicitly say so, it obviously intends ret to be included as an "unconditional branch." This is common sense, and accordingly the following example still does not work:

    // !!! FAILS - BAD EXAMPLE - NO
    ldc.i4.6        // enqueue item  -- NO!
    ldc.i4.5        // enqueue item  -- NO!
    ldc.i4.4        // enqueue item  -- NO!

    ldc.i4.0
    stloc i

_next:
    ldloc i
    ldc.i4.3
    blt _more
    ret 

_more:
    stloc cur       // de-queue item  -- NO! -- still follows an "unconditional branch"

    ldloc cur
    box int32
    call void Debug::WriteLine(object)

    ldloc i
    ldc.i4.1
    add
    stloc i
    br _next

All-in-all, I don't think you're going to get this technique to work, due to its fundamental requirement that (a.) facts that get hard-coded into the IL (i.e., number of enqueued data items) must correspond perfectly with (b.) facts that require runtime interpretation (i.e., number of loop iterations).

I think a more basic summary of the problem, as opposed to the ECMA description, is that all of the failure examples entail that the number of items on the execution stack experienced by one (or more) of the method's instructions is not fixed, instead obtaining different values at different times as the method executes, and this is the underlying situation that will always be strictly disallowed, no matter how you achieve it. To me, that seems like the more general inviolable constraint.

For example, at instruction _more in the demo method—and all within the scope of a single invocation—there would first be 2, then 1, then 0 "excess" items on the execution stack (notice that I subtracted one for each iteration from what you might have expected, and this is because I used the word "excess" just prior as an attempt to highlight the fact that within each individual loop iteration, one item is properly expected—and required—at location _more, namely for the putative dequeuing operation stloc cur).

I'd speculate that for an IL method to be valid, there must be no way for any of its instructions to experience variation in the prevailing depth of the execution stack. In other words, there must be one-and-exactly-only-one stack depth value which can be statically determined and assigned to each of the method's instructions. Intuitively, the situation otherwise would likely render the JIT task either wildly intractable or perhaps even provably impossible.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!