semantic type checking analysis in bison

↘锁芯ラ 提交于 2019-12-04 12:34:18

This isn't the answer you're hoping for. I think the reason that you haven't seen examples of what you want is that it's impractical to enforce typing rules in the grammar file (the .y); rather, developers accomplish this in procedural .c or .cpp code. Generally, you will have do some analysis of the parsed input anyway, so it's a byproduct to enforce the semantic rules as you do so.

As an aside, I don't quite understand how you are parsing expressions, given the fragment of your grammar that you reproduce in your question.

Here's why I claim that it's impractical. (1) Your type information has to percolate all through the non-terminals of the grammar. (2) Worse, it has to be reflected in variable names.

Consider this toy example of parsing simple assignment statements that can use identifiers, numeric constants, and the four desk calculator operators. The NUMBER token can be an integer like 42 or a float like 3.14. And let's say that an IDENTIFIER is one letter, A-Z.

%token IDENTIFIER NUMBER

%%

stmt : IDENTIFIER '=' expr
     ;

expr : expr '+' term
     | expr '-' term
     | term
     ;

term : term '*' factor
     | term '/' factor
     | factor
     ;

factor : '(' expr ')'
       | '-' factor
       | NUMBER
       | IDENTIFIER
       ;

Now let's try to introduce typing rules. We'll separate the NUMBER token into FLT_NUMBER and INT_NUMBER. Our expr, term, and factor non-terminals split into two as well:

%token IDENTIFIER FLT_NUMBER INT_NUMBER

stmt : IDENTIFIER '=' int_expr
     | IDENTIFIER '=' flt_expr
     ;

int_expr : int_expr '+' int_term
         | int_expr '-' int_term
         | int_term
         ;

flt_expr : flt_expr '+' flt_term
         | flt_expr '-' flt_term
         | flt_term
         ;

int_term : int_term '*' int_factor
         | int_term '/' int_factor
         | int_factor
         ;

flt_term : flt_term '*' flt_factor
         | flt_term '/' flt_factor
         | flt_factor
         ;

int_factor : '(' int_expr ')'
           | '-' int_factor
           | INT_NUMBER
           | int_identifier
           ;

flt_factor : '(' flt_expr ')'
           | '-' flt_factor
           | FLT_NUMBER
           | flt_identifier
           ;

int_identifier : IDENTIFIER ;

flt_identifier : IDENTIFIER ;

As our grammar stands at this point, there's a conflict: the parser can't tell whether to recognize an IDENTIFIER as a int_identifier or a flt_identifier. So it doesn't know whether to reduce A = B as IDENTIFIER = int_expr or IDENTIFIER = flt_expr.

(Here's where my understanding of Ruby is a little soft:) Ruby (like most languages) doesn't provide a way at the lexical level to determine the numeric type of an identifier. Contrast this with old school BASIC, where A denotes a number and A$ denotes a string. In other words, if you invented a language where, say, A# denotes an integer and A@ denotes a float, then you could make this work.

If you wanted to permit limited mixed-type expressions, like an int_term '*' flt_factor, then your grammar would get even more complicated.

There might be ways to work around these issues. A parser built from technology other than yacc/bison might make it easier. At the least, perhaps my sketch will give you some ideas to pursue further.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!