Building a parser (Part I)

前端 未结 4 573
轻奢々
轻奢々 2020-12-12 16:24

I\'m making my own javascript-based programming language (yeah, it is crazy, but it\'s for learn only... maybe?). Well, I\'m reading about parsers and the first pas

4条回答
  •  萌比男神i
    2020-12-12 17:11

    Most toolkits split the complete process into two separate parts

    • lexer (aka. tokenizer)
    • parser (aka. grammar)

    The tokenizer will split the input data into tokens. The parser will only operate on the token "stream" and build the structure.

    Your question seems to be focused on the tokenizer. But your second solution mixes the grammar parser and the tokenizer into one step. Theoretically this is also possible but for a beginner it is much easier to do it the same way as most other tools/framework: keep the steps separate.

    To your first solution: I would tokenize your example like this:

    T_KEYWORD_IF   "if"
    T_LPAREN       "("
    T_IDENTIFIER   "x"
    T_GT           ">"
    T_LITARAL      "5"
    T_RPAREN       ")"
    T_KEYWORD_RET  "return"
    T_KEYWORD_TRUE "true"
    T_TERMINATOR   ";"
    

    In most languages keywords cannot be used as method names, variable names and so on. This is reflected already on the tokenizer level (T_KEYWORD_IF, T_KEYWORD_RET, T_KEYWORD_TRUE).

    The next level would take this stream and - by applying a formal grammar - would build some datastructure (often called AST - Abstract Syntax Tree) which might look like this:

    IfStatement:
        Expression:
            BinaryOperator:
                Operator:     T_GT
                LeftOperand: 
                   IdentifierExpression:
                       "x"
                RightOperand:
                    LiteralExpression
                        5
        IfBlock
            ReturnStatement
                ReturnExpression
                    LiteralExpression
                        "true"
        ElseBlock (empty)
    

    Implementing the parser by hand is usually done by some frameworks. Implementing something like that by hand and efficiently is usually done at a university in the better part of a semester. So you really should use some kind of framework.

    The input for a grammar parser framework is usually a formal grammar in some kind of BNF. Your "if" part migh look like this:

    IfStatement: T_KEYWORD_IF T_LPAREN Expression T_RPAREN Statement ;
    
    Expression: LiteralExpression | BinaryExpression | IdentifierExpression | ... ;
    
    BinaryExpression: LeftOperand BinaryOperator RightOperand;
    
    ....
    

    That's only to get the idea. Parsing a realworld-language like Javascript correctly is not an easy task. But funny.

提交回复
热议问题