How does flex support bison-location exactly?

前端未结

关注

 8  1146

I\'m trying to use flex and bison to create a filter, because I want get certain grammar elements from a complex language. My plan is to use flex + bison to recognise the gr

相关标签:

8条回答

忘掉有多难

2020-12-24 08:06
Neither bison nor flex updates yylloc automatically, but it's actually not difficult to do it yourself—if you know the trick.

The trick to implementing yylloc support is that, even though yyparse() declares yylloc, it never changes it. That means that if you modify yylloc in one call to the lexer, you'll find the same values in it at the next call. Thus, yylloc will contain the position of the last token. Since the last token's end is the same as the current token's start, you can use the old yylloc value to help you determine the new value.

In other words, yylex() should not calculate yylloc; it should update yylloc.

To update yylloc, we must first copy the last_ values to first_, and then update the last_ values to reflect the length of the just-matched token. (This is not the strlen() of the token; it's the lines-and-columns length.) We can do this in the YY_USER_ACTION macro, which is called just before any lexer action is performed; that ensures that if a rule matches but it doesn't return a value (for instance, a rule skipping whitespace or comments), the location of that non-token is skipped, rather than being included at the beginning of the actual token, or lost in a way that makes the location tracking inaccurate.

Here's a version meant for a reentrant parser; you could modify it for a non-reentrant parser by swapping the -> operators for .:
```
#define YY_USER_ACTION \
    yylloc->first_line = yylloc->last_line; \
    yylloc->first_column = yylloc->last_column; \
    for(int i = 0; yytext[i] != '\0'; i++) { \
        if(yytext[i] == '\n') { \
            yylloc->last_line++; \
            yylloc->last_column = 0; \
        } \
        else { \
            yylloc->last_column++; \
        } \
    }
```
If you'd prefer, you could instead put that code in a function and make the macro call the function, but the two techniques are equivalent.
0 讨论(0)
发布评论:

提交评论
- 加载中...
温柔的废话

2020-12-24 08:06
So, I got this to "work", but with a couple of extra steps (I may have overlooked them here ... apologies in that case):
1. In parser.y, I had to say:
```
#define YYLEX_PARAM &yylval, &yylloc
```
  even with %locations and bison --locations, to get it to pass the data.
2. In lexer.l I had to use -> instead of . for yylloc
3. Also in lexer.l, I reset the column in the action:
```
[\n] { yycolumn = 1; }
```
Obviously a bit more complex, for \r etc, but at least I got it to work.
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-12-24 08:09
I like Shlomi's answer.

In addition I was looking for updating column location as well. Found http://oreilly.com/linux/excerpts/9780596155971/error-reporting-recovery.html which made more sense after reading Shlomi's answer.

Unfortunately there is a typo on that page for yylloc. I've simplified it below a bit.

In your parser add:
```
%locations
```
in your lexer:
```
%{

#include "parser.tab.h"

int yycolumn = 1;

#define YY_USER_ACTION yylloc.first_line = yylloc.last_line = yylineno; \
    yylloc.first_column = yycolumn; yylloc.last_column = yycolumn + yyleng - 1; \
    yycolumn += yyleng; \
    yylval.str = strdup(yytext);

%}

%option yylineno
```
There might be something going on with column location which doesn't strictly keep track of columns but rather just keeps increasing. That's just my ignorance and appologize if it confuses anyone. I'm currently using column to keep a file character count which in my case is more beneficial than column location.

Hope that helps.
0 讨论(0)
发布评论:

提交评论
- 加载中...
借酒劲吻你

2020-12-24 08:12
I think I managed to make it work ( credit goes to the writer of the bison manual ltcalc lexical analyzer). By default, bison creates yylloc that contains
```
{ first_line, first_column , last_line , last_column }
```
We only need to update those values in our lexical analyzer. Ex :
```
[ \t]     { ++yylloc.last_column; }
[\n]      { yyloc.last_column = 0; return EOL; }
[a-zA-Z]+ { 
            yylloc.last_column += strlen(yytext);
            return IDENTIFIER;
          }
```
Now in bison, to retrieve those fields:
```
statement : IDENTIFIER '=' expression 
            { printf("%d - %d\n", @1.last_line, @1.last_column); }
```
By default these fields are initialized to one, we should initialize the column fields to zero otherwise they will report the wrong column.
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-12-24 08:14
The yylex declaration probably changed because you used a reentrant or pure-parser. Seems like many documents around the web suggest it's required if you want bison locations to work but it's not required.

I needed line numbers too and found the Bison documentation confusing in that regard. The simple solution (using the global var yylloc): In your Bison file just add the %locations directive:
```
%{
...
%}
%locations
...
%%
...
```
in your lexer:
```
%{
...
#include "yourprser.tab.h"  /* This is where it gets the definition for yylloc from */
#define YY_USER_ACTION yylloc.first_line = yylloc.last_line = yylineno;
%}
%option yylineno
...
%%
...
```
The YY_USER_ACTION macro is "called" before each of your token actions and updates yylloc. Now you can use the @N/@$ rules like this:
```
statement : error ';'   { fprintf(stderr, "Line %d: Bad statement.\n", @1.first_line); }
```
, or use the yylloc global var:
```
void yyerror(char *s)
{
  fprintf(stderr, "ERROR line %d: %s\n", yylloc.first_line, s);
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
粉色の甜心

2020-12-24 08:25

Take a look at section 3.6 of the Bison manual - that seems to cover locations in some detail. Combined with what you found in the Flex manual, that may be sufficient.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页