multiple error reporting with menhir: which token?

ぃ、小莉子 提交于 2019-12-10 14:05:21

问题


I am writing a small parser with Menhir + Ocamllex and I have two requirements I cannot seem to meet at the same time

  • I would like to keep parsing after an error (to report more errors).
  • I would like to print the token at which the error ocurred.

I can do only 1) easily, by using the error token. I can also do only 2) easily, using the approach suggested for this question. However, I don't know of an easy way to achieve both.

The way I handle errors right now goes something like this:

pair:
| left = prodA SEPARATOR right = prodA { (* happy case *) }
| error SEPARATOR right = prodA { print_error_report $startpos;
(* would like to continue after the first error, just in case
   there is a second error, so I report both *) }

One thing that would help me is accessing the lexbuf itself, so I could get the token directly. This would mean instead of $startpos I pass something like $lexbuf But as far as I can tell, there is no official way to access the lexbuf. The solution in 1 works only at the level of the caller to the parser, where the caller is itself passing lexbuf t othe parser, but not within semantic actions.

Does anyone know if it is actually available somehow? or perhaps a workaround?


回答1:


Thanks to combined work by Frédéric Bour and François Pottier, there is a new version of Menhir available that supports incremental parsing. See the announcement email sent on December 17.

The idea of this incremental API is to reverse control: instead of the parser calling the lexer to process the input, you have a lower-level API where you manipulate the parser state which returns an updated state after each consumed token (in this is slightly more fine-grained as you can observe internal reductions that do not require new tokens). In particular, you can observe whether the resulting parser state is an error, and choose to backtrack and provide a different input (depending on your error-recovery startegy) to go farther along in your input.

The general idea is that this will allow to implement good error-recovery and error-reporting strategies on the parser-user side, and slowly deprecate the rather inflexible "error token" mechanism.

This is already usable, but work on those features is still ongoing, and you should expect a more robust support for these new features in other releases over the following months.



来源:https://stackoverflow.com/questions/27350899/multiple-error-reporting-with-menhir-which-token

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!