How is the Python grammar used internally?

后端未结

关注

 4  1535

难免孤独 2020-12-09 16:59

I\'m trying to get a deeper understanding of how Python works, and I\'ve been looking at the grammar shown at http://docs.python.org/3.3/reference/grammar.html.

I no

4条回答

天命终不由人 (楼主)

2020-12-09 17:18
A grammar is used to describe all possible strings in a language. It is also useful in specifying how a parser should parse the language.

In this grammar it seems like they are using their own version of EBNF, where a non-terminal is any lowercase word and a terminal is all uppercase or surrounded by quotes. For example, NEWLINE is a terminal, arith_expr is a non-terminal and 'if' is also a terminal. Any non-terminal can be replaced by anything to the right of the colon of it's respective production rule. For example, if you look at the first rule:

single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE

We can replace single_input with one of either a NEWLINE, a simple_stmt or a compound_stmt followed by a NEWLINE. Suppose we replaced it with "compound_stmt NEWLINE", then we would look for the production rule for compound_stmt:

compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated

and choose which of these we want to use, and substitute it for "compound_stmt" (Keeping NEWLINE in it's place)

Suppose we wanted to generate the valid python program:
```
if 5 < 2 + 3 or not 1 == 5:
    raise
```
We could use the following derivation:
1. single_input
2. compound_stmt NEWLINE
3. if_stmt NEWLINE
4. 'if' test ':' suite NEWLINE
5. 'if' or_test ':' NEWLINE INDENT stmt stmt DEDENT NEWLINE
6. 'if' and_test 'or' and_test ':' NEWLINE INDENT simple_stmt DEDENT NEWLINE
7. 'if' not_test 'or' not_test ':' NEWLINE INDENT small_stmt DEDENT NEWLINE
8. 'if' comparison 'or' 'not' not_test ':' NEWLINE INDENT flow_stmt DEDENT NEWLINE
9. 'if' expr comp_op expr 'or' 'not' comparison ':' NEWLINE INDENT raise_stmt DEDENT NEWLINE
10. 'if' arith_expr '<' arith_expr 'or' 'not' arith_expr comp_op arith_expr ':' NEWLINE INDENT 'raise' DEDENT NEWLINE
11. 'if' term '<' term '+' term 'or' 'not' arith_expr == arith_expr ':' NEWLINE INDENT 'raise' DEDENT NEWLINE
12. 'if' NUMBER '<' NUMBER '+' NUMBER 'or' 'not' NUMBER == NUMBER ':' NEWLINE INDENT 'raise' DEDENT NEWLINE
A couple of notes here, firstly, we must start with one of the non-terminals which is listed as a starting non-terminal. In that page, they list them as single_input, file_input, or eval_input. Secondly, a derivation is finished once all the symbols are terminal (hence the name). Thirdly, it is more common to do one substitution per line, for the sake of brevity I did all possible substitutions at once and started skipping steps near the end.

Given a string in the language, how do we find it's derivation? This is the job of a parser. A parser reverse-engineers a production sequence to first check that it is indeed a valid string, and furthermore how it can be derived from the grammar. It's worth noting that many grammars can describe a single language. However, for a given string, it's derivation will of course be different for each grammar. So technically we write a parser for a grammar not a language. Some grammars are easier to parse, some grammars are easier to read/understand. This one belongs in the former.

Also this doesn't specify the entire language, just what it looks like. A grammar says nothing about semantics.

If you're interested in more about parsing and grammar I recommend Grune, Jacobs - Parsing Techniques. It's free and good for self-study.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...