问题
I have a reEntrant parser which takes input from a string and has a structure to maintain context. A function is called with different input strings to be parsed. Relevant code of that function is:
void parseMyString(inputToBeParsed) {
//LEXICAL COMPONENT - INITIATE LEX PROCESSING
yyscan_t scanner;
YY_BUFFER_STATE buffer;
yylex_init_extra(&parseSupportStruct, &scanner );
//yylex_init(&scanner);
buffer = yy_scan_buffer(inputToBeParsed, i+2, scanner);
if (buffer == NULL) {
strcpy(errorStrings,"YY_BUFFER_STATE returned NULL pointer\n");
return (-1);
}
//BISON PART - THE ACTUAL PARSER
yyparse(scanner, &parseSupportStruct);
...
yylex_destroy(scanner);
...
}
My .l options are:
%option noinput nounput noyywrap 8bit nodefault
%option yylineno
%option reentrant bison-bridge bison-locations
%option extra-type="parseSupportStructType *"
Relevant lines from .y are:
%define api.pure full
%locations
%param { yyscan_t scanner }
%parse-param { parseSupportStructType* parseSupportStruct}
%code {
int yylex(YYSTYPE* yylvalp, YYLTYPE* yyllocp, yyscan_t scanner);
void yyerror(YYLTYPE* yyllocp, yyscan_t unused, parseSupportStructType* parseSupportStruct, const char* msg);
char *yyget_text (yyscan_t);
char *strcpy(char *, const char *);
}
%union {
int numval;
char *strval;
double floatval;
}
In my parser, in some rules, I try to access yyllocp->first_line. In the first call to parseMyString(...), I get the correct value. The second time, I get some uninitialized value. Do I need to initialize yyllocp->first_line in each call to parseMyString? How and where? I know I have given partial, redacted code, to explain the situation. Will be happy to provide further details.
Using valgrind I have removed memory leaks to the best of my abilites but some third-party library issues are beyond my control.
回答1:
Nothing in flex or bison will maintain the value of yylloc.
Bison parsers (other than push parsers) will initialise that variable. (If you accept the default location type -- that is, you don't #define YYLTYPE -- yylloc will be initialised to {1, 1, 1, 1}. Otherwise, it will be zero-initialised, whatever that means for whatever type it is.) Bison also produces code which computes the location of a non-terminal based on the locations of the non-terminal's first and last children. Flex's generated code doesn't touch the location object at all.
A flex scanner does automatically maintain yylineno if you ask enabled this feature with
%option yylineno
Flex can usually do that more efficiently than you can, and it handles all the corner cases (yyless, yymore, input(), REJECT). So if you want to track line numbers, I strongly recommend letting flex do it.
But there is one important issue with flex's yylineno support. In a reentrant scanner, the line number is stored in each flex buffer, not in the scanner state object. That's almost certainly the correct place to store it, IMHO, because if you are using multiple buffers, they probably represent multiple input steams, and normally you'll want to cite the number of a line within its file. But yy_scan_buffer does not initialise this field. (And therefore neither do yy_scan_string and yy_scan_bytes, which are just wrappers around yy_scan_buffer.)
So if you are using one of the yy_scan_* interfaces, you should reset yylineno by calling yyset_lineno immediately after yy_scan_*. In your case, this would be:
buffer = yy_scan_buffer(inputToBeParsed, i+2, scanner);
yyset_lineno(1, scanner);
Once you've got yylineno, it's easy to maintain the yylloc object. Flex has a hook which lets you inject code just before any the action for a pattern is executed (even if the action is empty) and this hook can be used to automatically maintain yylloc. In this answer, I provide a simple example of this technique (which depends on yylineno being maintained by the flex-generated scanner):
#define YY_USER_ACTION \
yylloc->first_line = yylloc->last_line; \
yylloc->first_column = yylloc->last_column; \
if (yylloc->last_line == yylineno) \
yylloc->last_column += yyleng; \
else { \
yylloc->last_line = yylineno; \
yylloc->last_column = yytext + yyleng - strrchr(yytext, '\n'); \
}
As the notes in that answer indicate, the above is not fully general, but it will work in many circumstances:
This
YY_USER_ACTIONmacro should work for any scanner which does not useyyless(),yymore(),input()orREJECT. Correctly coping with these features is not too difficult but it seemed out of scope here.
You cannot handle yyless(), yymore() or REJECT before the action (since before the action it's not possible to know if they will be executed), so a more robust location-tracker in an application which used those features would have to include code to fix yylloc():
For
yyless(), the above code for settinglast_lineandlast_columncan be re-executed after theyyless()call, since the flex scanner will fixyylengandyylineno.For
REJECT, it is not possible to insert code afterREJECT. The only way to handle it is to keep a backup ofyyllocand restore it immediately before theREJECTmacro. (I strongly advise against usingREJECT. It's extremely inefficient and can almost always be replaced with the combination of a call toyyless()and a start condition.)For
yymore(),yyllocis still correct, but the next action must not overwrite the token start position. Getting that right would probably require maintaining a flag to indicate whether or notyymore()had been called.For
input(), if you want the characters read to be considered part of the current token, you could advance the end location inyyllocafter the call toinput()(which requires distinguishing betweeninput()returning a newline, an end-of-file indicator, or a regular character). Alternatively, if you want the characters read withinput()to not be considered part of any token, you would need to abandon the idea of using the end position of the previous token as the start position of the current token, which would require keeping a separation position value to be used as the start position of the next token.
来源:https://stackoverflow.com/questions/59754253/yyllocp-first-line-returns-uninitialized-value-in-second-iteration-of-a-reentra