Ply Lex parsing problem

后端 未结 2 1753
慢半拍i
慢半拍i 2021-01-18 04:17

I\'m using ply as my lex parser. My specifications are the following :

t_WHILE = r\'while\'  
t_THEN = r\'then\'  
t_ID = r\'[a-zA-Z_][a-zA-Z0-9_]*\'  
t_NUM         


        
2条回答
  •  不要未来只要你来
    2021-01-18 04:51

    The reason that this didn't work is related to the way ply prioritises matches of tokens, the longest token regex is tested first.

    The easiest way to prevent this problem is to match identifiers and reserved words at the same type, and select an appropriate token type based on the match. The following code is similar to an example in the ply documentation

    import ply.lex
    
    tokens = [ 'ID', 'NUMBER', 'LESSEQUAL', 'ASSIGN' ]
    reserved = {
        'while' : 'WHILE',
        'then' : 'THEN'
    }
    tokens += reserved.values()
    
    t_ignore    = ' \t'
    t_NUMBER    = '\d+'
    t_LESSEQUAL = '\<\='
    t_ASSIGN    = '\='
    
    def t_ID(t):
        r'[a-zA-Z_][a-zA-Z0-9_]*'
        if t.value in reserved:
            t.type = reserved[ t.value ]
        return t
    
    def t_error(t):
        print 'Illegal character'
        t.lexer.skip(1)
    
    lexer = ply.lex.lex()
    lexer.input("while n <= 0 then h = 1")
    while True:
        tok = lexer.token()
        if not tok:
            break
        print tok
    

提交回复
热议问题