How is the Python grammar used internally?

后端 未结 4 1535
难免孤独
难免孤独 2020-12-09 16:59

I\'m trying to get a deeper understanding of how Python works, and I\'ve been looking at the grammar shown at http://docs.python.org/3.3/reference/grammar.html.

I no

4条回答
  •  天命终不由人
    2020-12-09 17:18

    A grammar is used to describe all possible strings in a language. It is also useful in specifying how a parser should parse the language.

    In this grammar it seems like they are using their own version of EBNF, where a non-terminal is any lowercase word and a terminal is all uppercase or surrounded by quotes. For example, NEWLINE is a terminal, arith_expr is a non-terminal and 'if' is also a terminal. Any non-terminal can be replaced by anything to the right of the colon of it's respective production rule. For example, if you look at the first rule:

    single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE

    We can replace single_input with one of either a NEWLINE, a simple_stmt or a compound_stmt followed by a NEWLINE. Suppose we replaced it with "compound_stmt NEWLINE", then we would look for the production rule for compound_stmt:

    compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated

    and choose which of these we want to use, and substitute it for "compound_stmt" (Keeping NEWLINE in it's place)

    Suppose we wanted to generate the valid python program:

    if 5 < 2 + 3 or not 1 == 5:
        raise
    

    We could use the following derivation:

    1. single_input
    2. compound_stmt NEWLINE
    3. if_stmt NEWLINE
    4. 'if' test ':' suite NEWLINE
    5. 'if' or_test ':' NEWLINE INDENT stmt stmt DEDENT NEWLINE
    6. 'if' and_test 'or' and_test ':' NEWLINE INDENT simple_stmt DEDENT NEWLINE
    7. 'if' not_test 'or' not_test ':' NEWLINE INDENT small_stmt DEDENT NEWLINE
    8. 'if' comparison 'or' 'not' not_test ':' NEWLINE INDENT flow_stmt DEDENT NEWLINE
    9. 'if' expr comp_op expr 'or' 'not' comparison ':' NEWLINE INDENT raise_stmt DEDENT NEWLINE
    10. 'if' arith_expr '<' arith_expr 'or' 'not' arith_expr comp_op arith_expr ':' NEWLINE INDENT 'raise' DEDENT NEWLINE
    11. 'if' term '<' term '+' term 'or' 'not' arith_expr == arith_expr ':' NEWLINE INDENT 'raise' DEDENT NEWLINE
    12. 'if' NUMBER '<' NUMBER '+' NUMBER 'or' 'not' NUMBER == NUMBER ':' NEWLINE INDENT 'raise' DEDENT NEWLINE

    A couple of notes here, firstly, we must start with one of the non-terminals which is listed as a starting non-terminal. In that page, they list them as single_input, file_input, or eval_input. Secondly, a derivation is finished once all the symbols are terminal (hence the name). Thirdly, it is more common to do one substitution per line, for the sake of brevity I did all possible substitutions at once and started skipping steps near the end.

    Given a string in the language, how do we find it's derivation? This is the job of a parser. A parser reverse-engineers a production sequence to first check that it is indeed a valid string, and furthermore how it can be derived from the grammar. It's worth noting that many grammars can describe a single language. However, for a given string, it's derivation will of course be different for each grammar. So technically we write a parser for a grammar not a language. Some grammars are easier to parse, some grammars are easier to read/understand. This one belongs in the former.

    Also this doesn't specify the entire language, just what it looks like. A grammar says nothing about semantics.

    If you're interested in more about parsing and grammar I recommend Grune, Jacobs - Parsing Techniques. It's free and good for self-study.

提交回复
热议问题