How to translate token names in bison

只愿长相守 提交于 2019-12-11 07:54:13

问题


I have a bison parser that works sufficiently well for my purpose. It even prints localized error messages. But the token names are not translated. Looking at the source code I found, that I can use define YY_ to my own gettext function and pass YY_ to gettext in order to provide my own translation of the error messages. But this does not work for token names.

Is there some switch or hidden feature that I could use to extract the token names from the parser and to translate them?

So far I found yytnamerr which could be overridden to format the token names. As it does more than just reformat names I don't like to touch this function, as I would have to sync it with the progress of Bison. On the other hand, I need also a way to extract the token names from the parser in order to add them to the language definition file.

How do you implement user friendly error reporting with Bison?


回答1:


If you specify %token-table, then bison will generate the yytname table. This table includes all bison symbols, including internal symbols ($end, $error and $undefined), terminals -- named, single-quoted characters and double-quoted strings -- and non-terminals, which include also the generated names for mid-rule actions.

With yytname visible, it's easy to extract the tokens in a format recognizable by the gettext package. For example, you could add to your .y file something like this:

#ifdef MAKE_TOKEN
int main(void) {
   puts("#include <libintl.h>");
   puts("#include <stdio.h>");
   puts("int main() {");
   for (const char* const* p = yytname; *p; ++p) {
     // See Note 1 below
     printf("  printf(\"%%s: %%s\\n\", \"%s\", gettext (\"%s\"));\n", *p, *p);
   }
   puts("}");
 }
 #endif

and then add a stanza to your Makefile (making appropriate substitutions for file names):

messages.pot: my_parser.c
    $(CC) $(CFLAGS) -DMAKE_TOKEN -o token_lister $<
    ./token_lister > my_parser.tokens.c
    # See Note 2 below
    $(CC) -o my_parser.tokens my_parser.tokens.c
    xgettext -o $@ my_parser.tokens.c

Once you have the translations, you still need to figure out how to use them, since bison does not offer an interface for inserting translated token names into its generated error messages. Probably the simplest way is to insert the translations directly into yytname by iterating through that array and substituting each token name with its translation (that would have to be done at parser startup). That presents the annoyance that yytname is declared const by the bison skeleton; however, a very simple sed or awk invocation can be used to remove the offending const. [Note 3]

Having said that, it's not at all clear to me that these automatically generated error messages are "user friendly", unless the user is surprisingly familiar with the language's formal grammar. And a user who is familiar with the grammar might well prefer the original token name, in order to find it in the grammar, rather than a non-expert translation which only coincidentally resembles the original concept. Not that I'm pointing fingers at anyone in particular.

You might enjoy this fascinating essay by Russ Cox, about how he implemented actually friendly error messages for Go.


NOTES:

  1. The direct use of the token name in a C string won't work in the case of the tokens whose representation includes a " or a \. In particular, any keyword token ("and" or "<=") will fail, as will the single character tokens '"' and '\\'. These don't show up very often in grammars; if you're substituting internationalized keywords in your scanner, you're very unlikely to use bison's quoted string feature at all.

    If you do want to use such tokens, you'll have to output code for the gettext generator which escapes " and \ characters in the token name.

  2. Actually, it would be better to use several stanzas, but that one is enough to get you going, I think. You probably want to mark some or all of the intermediate results as .INTERMEDIATE. The generated executable my_parser.tokens can be used to verify the translations, but that's totally optional, so you might want to remove that line. On the other hand, it does verify that the strings are compilable.

  3. See Russ Cox's gc (link provided above) for an example. His Makefile modifies the bison output to remove the const from yytname, so that the generated parser can substitute his preferred token names for error messages, so you can see the general idea at work.



来源:https://stackoverflow.com/questions/18552338/how-to-translate-token-names-in-bison

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!