I\'ve been looking through the SQLAlchemy api and it\'s incredibly complex, so I thought I\'d ask here to see if anyone can explain this to me in a somewhat digestable forma
That finally brings me to my question. SQLAlchemy is capable of understanding precedence groups somehow, but I can't for the life of me understand how it does it.
SQLAlchemy does not have to do much work here. Most of the work is done by Python, which parses objects in a specific order. Python parses expressions according to the rules of operator precedence, and so executes the combined expressions in a specific order based on precedence. If that order of precedence is correct for your application, and don't mind about always grouping nested expressions, you are set. That's not always the case in SQL, and SQLAlchemy wants to output valid SQL expressions with minimal extraneous parenthesis use, so SQLAlchemy does consult a precedence table of its own. That way it can decide when (...)
grouping is required in the output.
SQLAlchemy returns dedicated *Clause*
expression objects representing the operation on its operands (each of which can be further expressions), and then combines those further when those operation objects are also used in operations. In the end, you'd have a tree of objects, and traversal of that tree during compilation to SQL then produces the grouped output you see, as needed. Where precedence requires it, SQLAlchemy does insert sqlalchemy.sql.elements.Grouping() objects, and it is up to the SQL dialect to produce the right syntax for grouping.
If you are looking at the SQLAlchemy source code, you'll want to look at the sqlalchemy.sql.operators.ColumnOperators class and it's parent class, sqlalchemy.sql.operators.Operators, which implements __or__ as a call to self.operate(or_, other)
(passing in the operator.or_() function). In SQLAlchemy this appears complicated, because this has to delegate to different types of comparisons for different types of objects and SQL dialects!
But at the base is the sqlalchemy.sql.default_comparator module, where or_
and and_
are (indirectly) mapped to classmethods of sqlalchemy.sql.elements.BooleanClauseList, producing an instance of that class.
The BooleanClauseList._construct() method is responsible for handling grouping there, by delegating to .self_group()
methods on the two clauses:
convert_clauses = [
c.self_group(against=operator) for c in convert_clauses
]
This passes in operator.or_
or operator.and_
, and so lets each operand decide if they need to use a Grouping()
instance, based on precedence. For BooleanClauseList
objects (so the result of ... | ...
or ... & ...
but then combined with another |
or &
operator), the ClauseList.self_group() method will produce a Grouping()
if self.operator
has a lower or equal precedence compared to against
:
def self_group(self, against=None):
# type: (Optional[Any]) -> ClauseElement
if self.group and operators.is_precedent(self.operator, against):
return Grouping(self)
else:
return self
where sqlalchemy.sql.operators.is_precedent() consults an expression precedence table:
_PRECEDENCE = {
# ... many lines elided
and_: 3,
or_: 2,
# ... more lines elided
}
def is_precedent(operator, against):
if operator is against and is_natural_self_precedent(operator):
return False
else:
return _PRECEDENCE.get(
operator, getattr(operator, "precedence", _smallest)
) <= _PRECEDENCE.get(against, getattr(against, "precedence", _largest))
So what happens for your two expressions? Python has picked up the ()
parentheses grouping. Lets first simplify the expressions to the base components, you basically have:
A | B & C
(A | B) & C
Python parses these two expressions according to its own precedence rules, and produces its own abstract syntax tree:
>>> import ast
>>> ast.dump(ast.parse('A | B & C', mode='eval').body)
"BinOp(left=Name(id='A', ctx=Load()), op=BitOr(), right=BinOp(left=Name(id='B', ctx=Load()), op=BitAnd(), right=Name(id='C', ctx=Load())))"
>>> ast.dump(ast.parse('(A | B) & C', mode='eval').body)
"BinOp(left=BinOp(left=Name(id='A', ctx=Load()), op=BitOr(), right=Name(id='B', ctx=Load())), op=BitAnd(), right=Name(id='C', ctx=Load()))"
These come down to
BinOp(
left=A,
op=or_,
right=BinOp(left=B, op=and_, right=C)
)
and
BinOp(
left=BinOp(left=A, op=or_, right=B),
op=and_,
right=C
)
which changes the order in which objects are combined! So the first leads to:
# process A, then B | C
leftop = A
rightop = BooleanClauseList(and_, (B, C))
# combine into A & (B | C)
final = BooleanClauseList(or_, (leftop, rightop))
# which is
BooleanClauseList(or_, (A, BooleanClauseList(and_, (B, C))))
Because the second clause here is a BooleanClauseList(and_, ...)
instance, the .self_group()
call for that clause doesn't return a Grouping()
; there self.operator
is and_
, which has a precedence of 3, which is higher, not lower or equal, to the precendence of or_
== 2 for the parent clause.
The other expression is executed by Python in a different order:
# process A | B, then C
leftop = BooleanClauseList(or_, (A, B))
rightop = C
# combine into (A | B) & C
final = BooleanClauseList(and_, (leftop, rightop))
# which is
BooleanClauseList(and_, (BooleanClauseList(or_, (A, B)), C))
Now the first clause is a BooleanClauseList(or_, ...)
instance, and it actually produces a Grouping
instance because self.operator
is or_
and that has a lower precedence to and_
from the parent clause list, and so the object tree becomes:
BooleanClauseList(and_, (Grouping(BooleanClauseList(or_, (A, B))), C))
Now, if all you want to do is ensure you have your expressions grouped in the right order, then you don't really need to inject your own Grouping()
objects. It doesn't really matter if you process and_(or_(A, B), C)
or and_((or_(A, B)), C)
when you are processing the object tree by traversal, but if you need to output text again (like SQLAlchemy must, to send to the database) then the Grouping()
objects are very helpful to record where you need to add (...)
text.
In SQLAlchemy, that happens in the SQL compiler, which uses a visitor pattern to call the sqlalchemy.sql.compiler.SQLCompiler.visit_grouping() method:
def visit_grouping(self, grouping, asfrom=False, **kwargs):
return "(" + grouping.element._compiler_dispatch(self, **kwargs) + ")"
That expression simply means: place (
before, and )
after, whatever the compilation output for grouping.element
is. While each SQL dialect does provide a subclass of the base compiler, none override the visit_grouping()
method.