ANTLRWorks 1.4.3 can't properly read extended-ASCII characters

与世无争的帅哥 提交于 2020-01-04 04:11:08

问题


I'm working on a fairly standard compiler project for which I picked ANTLR as the parser-generator. While updating an existing grammar from v2 to v3 I noticed that ANTLRWorks, the official IDE for ANTLR, wasn't displaying any of the extended-ASCII characters in the file properly. Even after using Notepad++ to convert the file to UTF8 from ASCII did it still display those characters as squares. In Notepad++ they display fine.

Since this glitch means that ANTLRWorks mauls the file when I save it I can not use it as an editor any more, which is rather annoying. Has anyone else here encountered this issue and maybe solved it? Much obliged.

[edit]: the specific issue occurs with the latest version of ANTLRWorks (downloaded it yesterday) and with the vams.g grammar file I got from http://www.antlr.org/grammar/1086696923011/vhdlams/index.html


回答1:


I cannot reproduce this with ANTLRWorks 1.4.3.

If I create a dummy grammar:

grammar T;
parse : . ;
Any   : . ;

and paste the complete extended ASCII set in a multi-line comment:

grammar T;

/*
€

‚
ƒ

...

ÿ
*/

parse : . ;
Any   : . ;

there's no problem. It doesn't matter if I copy the chars with ANTLRWorks, or with a normal editor and then edit the existing grammar with ANTLRWorks: the characters all stay the same after saving inside ANTLRWorks.

On a related note: the versions ANTLR 3.0 to 3.3 still have some dependencies with ANTLR 2.7 classes which might cause the org.antlr.Tool to trip over certain characters outside the ASCII set. Use ANTLR 3.4 in that case, which doesn't have these old dependencies anymore.

EDIT

I suspect there's some odd byte in the original grammar somewhere that is causing all the mayhem. I quickly copied only the rules from the original grammar, changed all v2.7 syntax to v3 syntax (changed double quoted literals to single quoted ones, protected became fragment and commented some custom code) and saved it in a new file. This file could be opened (and saved) by ANTLRWorks or a plain text editor without causing it to mangle the extended ASCII chars.

Here is the ANTLR v3 version of said grammar: http://pastebin.com/zU4xcvXt (the grammar is too big to post on SO...)

EDIT II

Is the grammar name useful for anything beyond just giving it a label?

No, it's not. It's, as you mentioned, only used to give a parser or lexer a name.

There are 4 types of grammars in ANTLR:

  • combined grammar, which looks like grammar T;, generating TLexer.java and TParser.java source files;
  • parser grammar, looking like parser grammar TP;, generating a TP.java source file;
  • lexer grammar, looking like lexer grammar TL;, generating a TL.java source file;
  • tree grammar, looking like tree grammar TWalker, generating a TWalker.java source file.


来源:https://stackoverflow.com/questions/8371956/antlrworks-1-4-3-cant-properly-read-extended-ascii-characters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!