Making a lexical Analyzer

前端未结

关注

 6  1333

独厮守ぢ 2020-12-12 16:34

I\'m working with a Lexical Analyzer program right now and I\'m using Java. I\'ve been researching for answers on this problem but until now I failed to find any. Here\'s my

6条回答

独厮守ぢ (楼主)

2020-12-12 16:57
You can use libraries like Lex & Bison in C or Antlr in Java. Lexical analysis can be done through making automata. I'll give you small example:

Suppose you need to tokenize a string where keywords (language) are {'echo', '.', ' ', 'end'). By keywords I mean language consists of following keywords only. So if I input
```
echo .
end .
```
My lexer should output
```
echo ECHO
 SPACE
. DOT
end END
 SPACE
. DOT
```
Now to build automata for such a tokenizer, I can start by
```
  ->(SPACE) (Back)
 |   
(S)-------------E->C->H->O->(ECHO) (Back)
 |              |
 .->(DOT)(Back)  ->N->D ->(END) (Back to Start)
```
Above diagram is prolly very bad, but idea is that you have a start state represented by S now you consume E and go to some other state, now you expect N or C to come for END and ECHO respectively. You keep consuming characters and reach different states within this simple finite state machine. Ultimately, you reach certain Emit state, for example after consuming E, N, D you reach emit state for END which emits the token out and then you go back to start state. This cycle continues forever as far as you have characters stream coming to your tokenizer. On invalid character you can either thrown an error or ignore depending on the design.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...