Nested generator expression - unexpected result [duplicate]

问题

Here's the test code:

units = [1, 2]
tens = [10, 20]
nums = (a + b for a in units for b in tens)
units = [3, 4]
tens = [30, 40]
[x for x in nums]

Under the assumption that the generator expression on line 3 (nums = ...) forms an iterator I would expect the final result to reflect the final assigned values for units and tens. OTOH, if that generator expression were to be evaluated at line 3, producing the result tuple, then I'd expect the first definitions of units and tens to be used.

What I see is a MIX; i.e., the result is [31, 41, 32, 42]!?

Can anyone explain this behavior?

回答1:

A generator expression creates a function of sorts; one with just one argument, the outermost iterable.

Here that's units, and that is bound as an argument to the generator expression when the generator expression is created.

All other names are either locals (such as a and b), globals, or closures. tens is looked up as a global, so it is looked up each time you advance the generator.

As a result, units is bound to the generator on line 3, tens is looked up when you iterated over the generator expression on the last line.

You can see this when compiling the generator to bytecode and inspecting that bytecode:

>>> import dis
>>> genexp_bytecode = compile('(a + b for a in units for b in tens)', '<file>', 'single')
>>> dis.dis(genexp_bytecode)
  1           0 LOAD_CONST               0 (<code object <genexpr> at 0x10f013ae0, file "<file>", line 1>)
              3 LOAD_CONST               1 ('<genexpr>')
              6 MAKE_FUNCTION            0
              9 LOAD_NAME                0 (units)
             12 GET_ITER
             13 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             16 PRINT_EXPR
             17 LOAD_CONST               2 (None)
             20 RETURN_VALUE

The MAKE_FUNCTION bytecode turned the generator expression code object into a function, and it is called immediately, passing in iter(units) as the argument. The tens name is not referenced at all here.

This is documented in the original generators PEP:

Only the outermost for-expression is evaluated immediately, the other expressions are deferred until the generator is run:
g = (tgtexp  for var1 in exp1 if exp2 for var2 in exp3 if exp4)
is equivalent to:
def __gen(bound_exp):
    for var1 in bound_exp:
        if exp2:
            for var2 in exp3:
                if exp4:
                    yield tgtexp
g = __gen(iter(exp1))
del __gen

and in the generator expressions reference:

Variables used in the generator expression are evaluated lazily when the __next__() method is called for generator object (in the same fashion as normal generators). However, the leftmost for clause is immediately evaluated, so that an error produced by it can be seen before any other possible error in the code that handles the generator expression. Subsequent for clauses cannot be evaluated immediately since they may depend on the previous for loop. For example: (x*y for x in range(10) for y in bar(x)).

The PEP has an excellent section motivating why names (other than the outermost iterable) are bound late, see Early Binding vs. Late Binding.

来源：https://stackoverflow.com/questions/22694321/nested-generator-expression-unexpected-result

标签

python

python-3.x

nested

generator

generator-expression