How is precedence grouping implemented in SQLAlchemy?

后端 未结 1 1816
轮回少年
轮回少年 2021-01-13 15:40

I\'ve been looking through the SQLAlchemy api and it\'s incredibly complex, so I thought I\'d ask here to see if anyone can explain this to me in a somewhat digestable forma

相关标签:
1条回答
  • 2021-01-13 16:27

    That finally brings me to my question. SQLAlchemy is capable of understanding precedence groups somehow, but I can't for the life of me understand how it does it.

    SQLAlchemy does not have to do much work here. Most of the work is done by Python, which parses objects in a specific order. Python parses expressions according to the rules of operator precedence, and so executes the combined expressions in a specific order based on precedence. If that order of precedence is correct for your application, and don't mind about always grouping nested expressions, you are set. That's not always the case in SQL, and SQLAlchemy wants to output valid SQL expressions with minimal extraneous parenthesis use, so SQLAlchemy does consult a precedence table of its own. That way it can decide when (...) grouping is required in the output.

    SQLAlchemy returns dedicated *Clause* expression objects representing the operation on its operands (each of which can be further expressions), and then combines those further when those operation objects are also used in operations. In the end, you'd have a tree of objects, and traversal of that tree during compilation to SQL then produces the grouped output you see, as needed. Where precedence requires it, SQLAlchemy does insert sqlalchemy.sql.elements.Grouping() objects, and it is up to the SQL dialect to produce the right syntax for grouping.

    If you are looking at the SQLAlchemy source code, you'll want to look at the sqlalchemy.sql.operators.ColumnOperators class and it's parent class, sqlalchemy.sql.operators.Operators, which implements __or__ as a call to self.operate(or_, other) (passing in the operator.or_() function). In SQLAlchemy this appears complicated, because this has to delegate to different types of comparisons for different types of objects and SQL dialects!

    But at the base is the sqlalchemy.sql.default_comparator module, where or_ and and_ are (indirectly) mapped to classmethods of sqlalchemy.sql.elements.BooleanClauseList, producing an instance of that class.

    The BooleanClauseList._construct() method is responsible for handling grouping there, by delegating to .self_group() methods on the two clauses:

    convert_clauses = [
        c.self_group(against=operator) for c in convert_clauses
    ]
    

    This passes in operator.or_ or operator.and_, and so lets each operand decide if they need to use a Grouping() instance, based on precedence. For BooleanClauseList objects (so the result of ... | ... or ... & ... but then combined with another | or & operator), the ClauseList.self_group() method will produce a Grouping() if self.operator has a lower or equal precedence compared to against:

    def self_group(self, against=None):
        # type: (Optional[Any]) -> ClauseElement
        if self.group and operators.is_precedent(self.operator, against):
            return Grouping(self)
        else:
            return self
    

    where sqlalchemy.sql.operators.is_precedent() consults an expression precedence table:

    _PRECEDENCE = {
        # ... many lines elided
    
        and_: 3,
        or_: 2,
    
        # ... more lines elided
    }
    
    def is_precedent(operator, against):
        if operator is against and is_natural_self_precedent(operator):
            return False
        else:
            return _PRECEDENCE.get(
                operator, getattr(operator, "precedence", _smallest)
            ) <= _PRECEDENCE.get(against, getattr(against, "precedence", _largest))
    

    So what happens for your two expressions? Python has picked up the () parentheses grouping. Lets first simplify the expressions to the base components, you basically have:

    A | B & C
    (A | B) & C
    

    Python parses these two expressions according to its own precedence rules, and produces its own abstract syntax tree:

    >>> import ast
    >>> ast.dump(ast.parse('A | B & C', mode='eval').body)
    "BinOp(left=Name(id='A', ctx=Load()), op=BitOr(), right=BinOp(left=Name(id='B', ctx=Load()), op=BitAnd(), right=Name(id='C', ctx=Load())))"
    >>> ast.dump(ast.parse('(A | B) & C', mode='eval').body)
    "BinOp(left=BinOp(left=Name(id='A', ctx=Load()), op=BitOr(), right=Name(id='B', ctx=Load())), op=BitAnd(), right=Name(id='C', ctx=Load()))"
    

    These come down to

    BinOp(
        left=A,
        op=or_,
        right=BinOp(left=B, op=and_, right=C)
    )
    

    and

    BinOp(
        left=BinOp(left=A, op=or_, right=B),
        op=and_,
        right=C
    )
    

    which changes the order in which objects are combined! So the first leads to:

    # process A, then B | C
    
    leftop = A
    rightop = BooleanClauseList(and_, (B, C))
    
    # combine into A & (B | C)
    final = BooleanClauseList(or_, (leftop, rightop))
    
    # which is
    BooleanClauseList(or_, (A, BooleanClauseList(and_, (B, C))))
    

    Because the second clause here is a BooleanClauseList(and_, ...) instance, the .self_group() call for that clause doesn't return a Grouping(); there self.operator is and_, which has a precedence of 3, which is higher, not lower or equal, to the precendence of or_ == 2 for the parent clause.

    The other expression is executed by Python in a different order:

    # process A | B, then C
    
    leftop = BooleanClauseList(or_, (A, B))
    rightop = C
    
    # combine into (A | B) & C
    final = BooleanClauseList(and_, (leftop, rightop))
    
    # which is
    BooleanClauseList(and_, (BooleanClauseList(or_, (A, B)), C))
    

    Now the first clause is a BooleanClauseList(or_, ...) instance, and it actually produces a Grouping instance because self.operator is or_ and that has a lower precedence to and_ from the parent clause list, and so the object tree becomes:

    BooleanClauseList(and_, (Grouping(BooleanClauseList(or_, (A, B))), C))
    

    Now, if all you want to do is ensure you have your expressions grouped in the right order, then you don't really need to inject your own Grouping() objects. It doesn't really matter if you process and_(or_(A, B), C) or and_((or_(A, B)), C) when you are processing the object tree by traversal, but if you need to output text again (like SQLAlchemy must, to send to the database) then the Grouping() objects are very helpful to record where you need to add (...) text.

    In SQLAlchemy, that happens in the SQL compiler, which uses a visitor pattern to call the sqlalchemy.sql.compiler.SQLCompiler.visit_grouping() method:

     def visit_grouping(self, grouping, asfrom=False, **kwargs):
         return "(" + grouping.element._compiler_dispatch(self, **kwargs) + ")"
    

    That expression simply means: place ( before, and ) after, whatever the compilation output for grouping.element is. While each SQL dialect does provide a subclass of the base compiler, none override the visit_grouping() method.

    0 讨论(0)
提交回复
热议问题