Are regular expressions used to build parsers?

后端 未结 8 1771
Happy的楠姐
Happy的楠姐 2020-12-16 15:58

This is just a question out of curiosity since I have been needing to get more and more into parsing and using regex lately.. it seems, for questions I come across in my sea

8条回答
  •  轮回少年
    2020-12-16 16:32

    Regular expressions are defined over arbitrary tokens, but most programmers encounter them only in the context of strings of characters, and so it is easy to beleive they are only useful for strings.

    As a pure capability, regular expressions (actually, a single regular expression) cannot parse any language that requires a context-free grammar.

    What makes context-free grammars different than regular expressions is that you can define a large set of named "recognizers" of subgrammars of a language, that can refer to one another recursively. These rules can all be limited to just the simple form of:

     LHS =  RHS1 RHS2 ... RHSn ;
    

    (so call "Backus Naur form" or BNF) where each LHS and RHSi are names primitive language elements or nonterminals in the langauge. (I build a very complex language processing tool that uses just this form; you need more rules but it is very usable).

    But most people writing grammars want a more expressive form, and so use an "extended BNF". If you examine these EBNFs closely what they generally do is add the ideas from regular expressions (alternation, kleene star/plus) to the BNF formalism. Thus you can find EBNFs with "*" and "+".

    So, what follows is an EBNF for itself, using regexp ideas:

     EBNF = RULE+ ;
     RULE = IDENTIFIER '=' ALTERNATIVES ';' ;
     ALTERNATIVES = RHS ( '|' RHS )* ;
     RHS = ITEM* ;
     ITEM = IDENTIFIER | QUOTEDTOKEN | '(' ALTERNATIVES ')' | ITEM ( '*' | '+' ) ;
    

    So, regular expression ideas can be used to express grammars. A parser generator that accepts such notation (including you doing it by hand) is needed to generate a parser from a grammar instance.

提交回复
热议问题