PHP: Copy On Write and Assign By Reference perform different on PHP5 and PHP7

前端 未结 2 1817
迷失自我
迷失自我 2021-01-05 18:01

We have a piece of simple code:

1    

On PHP5, it outputs 8, because:

2条回答
  •  独厮守ぢ
    2021-01-05 18:18

    Disclaimer: I'm not a PHP Internals expert (yet?) so this is all from my understanding, and not guaranteed to be 100% correct or complete. :)

    So, firstly, the PHP 7 behaviour - which, I note, is also followed by HHVM - appears to be correct, and PHP 5 has a bug here. There should be no extra assign by reference behaviour here, because regardless of execution order, the result of the two calls to ++$i should never be the same.

    The opcodes look fine; crucially, we have two temp variables $2 and $3, to hold the two increment results. But somehow, PHP 5 is acting as though we'd written this:

    $i = 2;
    $i++; $temp1 =& $i;
    $i++; $temp2 =& $i;
    echo $temp1 + $temp2; 
    

    Rather than this:

    $i = 2;
    $i++; $temp1 = $i;
    $i++; $temp2 = $i;
    echo $temp1 + $temp2; 
    

    Edit: It was pointed out on the PHP Internals mailing list that using multiple operations that modify a variable within a single statement is generally considered "undefined behaviour", and ++ is used as an example of this in C/C++.

    As such, it's reasonable for PHP 5 to return the value it does for implementation / optimisation reasons, even if it is logically inconsistent with a sane serialization into multiple statements.

    The (relatively new) PHP language specification contains similar language and examples:

    Unless stated explicitly in this specification, the order in which the operands in an expression are evaluated relative to each other is unspecified. [...] (For example,[...] in the full expression $j = $i + $i++, whether the value of $i is the old or new $i, is unspecified.)

    Arguably, this is a weaker claim than "undefined behaviour", since it implies they are evaluated in some particular order, but we're into nit-picking now.

    phpdbg investigation (PHP 5)

    I was curious, and want to learn more about the internals, so did some playing around using phpdbg.

    No references

    Running the code with $j = $i in place of $j =& $i, we start with 2 variables sharing an address, with a refcount of 2 (but no is_ref flag):

    Address         Refs    Type            Variable
    0x7f3272a83be8  2       (integer)       $i
    0x7f3272a83be8  2       (integer)       $j
    

    But as soon as you pre-increment, the zvals are separated, and only one temp var is sharing with $i, giving a refcount of 2:

    Address         Refs    Type            Variable
    0x7f189f9ecfc8  2       (integer)       $i
    0x7f189f859be8  1       (integer)       $j
    

    With reference assignment

    When the variables have been bound together, they share an address, with a refcount of 2, and a by-ref marker:

    Address         Refs    Type            Variable
    0x7f9e04ee7fd0  2       (integer)       &$i
    0x7f9e04ee7fd0  2       (integer)       &$j
    

    After the pre-increments (but before the addition), the same address has a refcount of 4, showing the 2 temp vars erroneously bound by reference:

    Address         Refs    Type            Variable
    0x7f9e04ee7fd0  4       (integer)       &$i
    0x7f9e04ee7fd0  4       (integer)       &$j
    

    The source of the issue

    Digging into the source on http://lxr.php.net, we can find the implementation of the ZEND_PRE_INC opcode:

    • PHP 5.6
    • PHP 7.0

    PHP 5

    The crucial line is this:

     SEPARATE_ZVAL_IF_NOT_REF(var_ptr);
    

    So we create a new zval for the result value only if it is not currently a reference. Further down, we have this:

    if (RETURN_VALUE_USED(opline)) {
        PZVAL_LOCK(*var_ptr);
        EX_T(opline->result.var).var.ptr = *var_ptr;
    }
    

    So if the return value of the decrement is actually used, we need to "lock" the zval, which following a whole series of macros basically means "increment its refcount", before assigning it as the result.

    If we created a new zval earlier, that's fine - our refcount is now 2, 1 for the actual variable, plus 1 for the operation result. But if we decided not to, because we needed to hold a reference, we're just incrementing the existing reference count, and pointing at a zval which may be about to be changed again.

    PHP 7

    So what's different in PHP 7? Several things!

    Firstly, the phpdbg output is rather boring, because integers are no longer reference counted in PHP 7; instead, a reference assignment creates an extra pointer, which itself has a refcount of 1, to the same address in memory, which is the actual integer. The phpdbg output looks like this:

    Address            Refs    Type      Variable
    0x7f175ca660e8     1       integer   &$i
    int (2)
    0x7f175ca660e8     1       integer   &$j
    int (2)
    

    Secondly, there is a special code path in the source for integers:

    if (EXPECTED(Z_TYPE_P(var_ptr) == IS_LONG)) {
        fast_long_increment_function(var_ptr);
        if (UNEXPECTED(RETURN_VALUE_USED(opline))) {
            ZVAL_COPY_VALUE(EX_VAR(opline->result.var), var_ptr);
        }
        ZEND_VM_NEXT_OPCODE();
    }
    

    So if the variable is an integer (IS_LONG) and not a reference to an integer (IS_REFERENCE) then we can just increment it in place. If we then need the return value, we can copy its value into the result (ZVAL_COPY_VALUE).

    If it's a reference, we won't hit that code, but rather than keeping references bound together, we have these two lines:

    ZVAL_DEREF(var_ptr);
    SEPARATE_ZVAL_NOREF(var_ptr);
    

    The first line says "if it's a reference, follow it to its target"; this takes us from our "reference to an integer" to the integer itself. The second - I think - says "if it's something refcounted, and has more than one reference, create a copy of it"; in our case, this will do nothing, because the integer doesn't care about refcounts.

    So now we have an integer we can decrement, that will affect all by-reference associations, but not by-value ones for refcounted types. Finally, if we want the return value of the increment, we again copy it, rather than just assigning it; and this time with a slightly different macro which will increase the refcount of our new zval if necessary:

    ZVAL_COPY(EX_VAR(opline->result.var), var_ptr);
    

提交回复
热议问题