Lisp grammar in yacc

 ̄綄美尐妖づ 提交于 2019-11-28 06:24:56

The error is really in the lexer. Your parentheses end up as the last "." in the lexer, and don't show up as parentheses in the parser.

Add rules like

\)     { return RPAREN; }
\(     { return LPAREN; }

to the lexer and change all occurences of '(', ')' to LPAREN and RPAREN respectively in the parser. (also, you need to #define LPAREN and RPAREN where you define your token list)

Note: I'm not sure about the syntax, could be the backslashes are wrong.

Lisp grammar can not be represented as context-free grammar, and yacc can not parse all lisp code. It is because of lisp features such as read-evaluation and programmable reader. So, in order just to read an arbitrary lisp code, you need to have a full lisp running. This is not some obscure, non-used feature, but it is actually used. E.g., CL-INTERPOL, CL-SQL.

If the goal is to parse a subset of lisp, then the program text is a sequence of sexprs.

You are correct in that you need to define a non-terminal. That would be defined as a set of sexpr. I'm not sure of the YACC syntax for that. I'm partial to ANTLR for parser generators and the syntax would be:

program: sexpr*

Indicating 0 or more sexpr.

Update with YACC syntax:

program :  /* empty */
        | program sexpr
        ;

Not in YACC, but might be helpful anyway, here's a full grammar in ANTLR v3 that works for the cases you described(excludes strings in the lexer because it's not important for this example, also uses C# console output because that's what I tested it with):

program: (sexpr)*;

sexpr: list
    |  atom            {Console.WriteLine("matched sexpr");}
    ;

list:     
   '('')'              {Console.WriteLine("matched empty list");}
   | '(' members ')'   {Console.WriteLine("matched list");}

    ;

members: (sexpr)+      {Console.WriteLine("members 1");};

atom: Id               {Console.WriteLine("ID");}
    | Num              {Console.WriteLine("NUM");}
    ;


Num: ( '0' .. '9')+;
Id: ('a' .. 'z' | 'A' .. 'Z')+;
Whitespace : ( ' ' | '\r' '\n' | '\n' | '\t' ) {Skip();};

This won't work exactly as is in YACC because YACC generates and LALR parser while ANTLR is a modified recursive descent. There is a C/C++ output target for ANTLR if you wanted to go that way.

Do you neccesarily need a yacc/bison parser? A "reads a subset of lisp syntax" reader isn't that hard to implement in C (start with a read_sexpr function, dispatch to a read_list when you see a '(', that in turn builds a list of contained sexprs until a ')' is seen; otherwise, call a read_atom that collects an atom and returns it when it can no longer read atom-constituent characters).

However, if you want to be able to read arbritary Common Lisp, you'll need to (at the worst) implement a Common Lisp, as CL can modify the reader run-time (and even switch between different read-tables run-time under program control; quite handy when you're wanting to load code written in another language or dialect of lisp).

It's been a long time since I worked with YACC, but you do need a top-level non-terminal. Could you be more specific about "tried it" and "it didn't seem to work"? Or, for that matter, what the errors are?

I'd also suspect that YACC might be overkill for such a syntax-light language. Something simpler (like recursive descent) might work better.

You could try this grammar here.

I just tried it, my "yacc lisp grammar" works fine :

%start exprs

exprs:
    | exprs expr
    /// if you prefer right recursion :
    /// | expr exprs
    ;

list:
    '(' exprs ')'
    ;

expr:
    atom
    | list
    ;

atom:
    IDENTIFIER
    | CONSTANT
    | NIL
    | '+'
    | '-'
    | '*'
    | '^'
    | '/'
    ;
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!