Making a lexical Analyzer

前端 未结 6 1333
独厮守ぢ
独厮守ぢ 2020-12-12 16:34

I\'m working with a Lexical Analyzer program right now and I\'m using Java. I\'ve been researching for answers on this problem but until now I failed to find any. Here\'s my

6条回答
  •  独厮守ぢ
    2020-12-12 16:57

    You can use libraries like Lex & Bison in C or Antlr in Java. Lexical analysis can be done through making automata. I'll give you small example:

    Suppose you need to tokenize a string where keywords (language) are {'echo', '.', ' ', 'end'). By keywords I mean language consists of following keywords only. So if I input

    echo .
    end .
    

    My lexer should output

    echo ECHO
     SPACE
    . DOT
    end END
     SPACE
    . DOT
    

    Now to build automata for such a tokenizer, I can start by

      ->(SPACE) (Back)
     |   
    (S)-------------E->C->H->O->(ECHO) (Back)
     |              |
     .->(DOT)(Back)  ->N->D ->(END) (Back to Start)
    

    Above diagram is prolly very bad, but idea is that you have a start state represented by S now you consume E and go to some other state, now you expect N or C to come for END and ECHO respectively. You keep consuming characters and reach different states within this simple finite state machine. Ultimately, you reach certain Emit state, for example after consuming E, N, D you reach emit state for END which emits the token out and then you go back to start state. This cycle continues forever as far as you have characters stream coming to your tokenizer. On invalid character you can either thrown an error or ignore depending on the design.

提交回复
热议问题