lexical-analysis | 易学教程

Multiple attributes in bison

阅读更多关于 Multiple attributes in bison

问题 I am doing semantic analysis in bison and i want to use multiple attribute associated with a token. A related part of my code is: %union semrec { int Type; char *id; } %start prog %token <id> tIDENT Here, i can only use the "id" attribute witht the tIDENT token. I also want to associate the "Type" attribute with tIDENT token. To do this, i tried the following: %token <id> tIDENT %token <Type> tIDENT But it gives me a redeclaration warning for token tIDENT. I also tried the following: %token

How to create a lexical analyzer in ANTLR 4 that can catch different types of lexical errors

阅读更多关于 How to create a lexical analyzer in ANTLR 4 that can catch different types of lexical errors

问题 I am using ANTLR 4 to create my lexer, but I don't how to create a lexical analyzer that catches different types of lexical errors. For example: If I have an unrecognized symbol like ^ the lexical analyzer should a report an error like this "Unrecognized symbol "^" " If I have an invalid identifier like 2n the lexical analyzer should report an error like this "identifier "2n" must begin with a letter" Please can you help me. 回答1: Create an error token rule for each known error and an

Antlr: how to match everything between the other recognized tokens?

阅读更多关于 Antlr: how to match everything between the other recognized tokens?

问题 How do I match all of the leftover text between the other tokens in my lexer? Here's my code: grammar UserQuery; expr: expr AND expr | expr OR expr | NOT expr | TEXT+ | '(' expr ')' ; OR : 'OR'; AND : 'AND'; NOT : 'NOT'; LPAREN : '('; RPAREN : ')'; TEXT: .+?; When I run the lexer on "xx AND yy", I get these tokens: x type:TEXT x type:TEXT type:TEXT AND type:'AND' type:TEXT y type:TEXT y type:TEXT This sort-of works, except that I don't want each character to be a token. I'd like to

Simple C Program

阅读更多关于 Simple C Program

问题 This program is based on the program in K&R in the input/output section #include <stdio.h> main(){ double sum, v; sum = 0; while (scanf("%1f",&v)==1) printf("\t%.2f\n",sum+=v); return 0; } It compiles ok. But when trying to run, from any input the output is "-NAN", presumably NOT A NUMBER. I have no idea why. Any advice would be appreciated. 回答1: The format code is wrong in scanf. It should be %lf (with lower case L), not %1f . while (scanf("%lf",&v)==1) This is because %lf scans for a double

Could not load main class in JavaCC

阅读更多关于 Could not load main class in JavaCC

问题 I am AI student and we work with JavaCC. I am new with it. I was trying simple example and I had some errors. 1) I downloaded JavaCC 0.6 from it's website 2) I extracted it in disc C 3) I wrote this code in a file with extension ".jj" PARSE_BEGIN(Test) import java.io.*; class Test { public static void main(string [] args) { new Test(new InputStreamReader(System.in)); start(); } } PARSE_END(Test); Token: { <number: (["0"-"9"])+("." (["0"-"9"])+)?(("e"|"E")(["0"-"9"])+)?>| <plus: "+"> } void

Python3.0 - tokenize and untokenize

阅读更多关于 Python3.0 - tokenize and untokenize

问题 I am using something similar to the following simplified script to parse snippets of python from a larger file: import io import tokenize src = 'foo="bar"' src = bytes(src.encode()) src = io.BytesIO(src) src = list(tokenize.tokenize(src.readline)) for tok in src: print(tok) src = tokenize.untokenize(src) Although the code is not the same in python2.x, it uses the same idiom and works just fine. However, running the above snippet using python3.0, I get this output: (57, 'utf-8', (0, 0), (0, 0)

Responsibilities of the Lexer and the Parser

阅读更多关于 Responsibilities of the Lexer and the Parser

问题 I'm currently implementing a lexer for a simple programming language. So far, I can tokenize identifiers, assignment symbols, and integer literals correctly; in general, whitespace is insignificant. For the input foo = 42 , three tokens are recognized: foo (identifier) = (symbol) 42 (integer literal) So far, so good. However, consider the input foo = 42bar , which is invalid due to the (significant) missing space between 42 and bar . My lexer incorrectly recognizes the following tokens: foo

lex : How to override YY_BUF_SIZE

阅读更多关于 lex : How to override YY_BUF_SIZE

问题 According to the manual YY_BUF_SIZE is 16K and we need to override it. However, the manual does not specify how to override it, nor could I find any command line option for this. Can someone please indicate how to change this. In the generated source YY_BUF_SIZE is defined as follows: #ifndef YY_BUF_SIZE #define YY_BUF_SIZE 16384 #endif so there may be a way to override it before this. 回答1: In your own code, simply #define YY_BUF_SIZE to whatever value you want. As long as you compile your

Recommendations for a good C#/ .NET based lexical analyser

阅读更多关于 Recommendations for a good C#/ .NET based lexical analyser

问题 Can anyone recommend a good .NET based lexical analyser, preferably written in C#? 回答1: ANTLR has a C# target 回答2: Download the Visual Studio SDK; it includes a managed parser/lexer generator. (Edit: It was written on my university campus, apparantly :D) 回答3: gplex and cs_lex 来源： https://stackoverflow.com/questions/131920/recommendations-for-a-good-c-net-based-lexical-analyser

How can I have a function that returns different types in F#?

阅读更多关于 How can I have a function that returns different types in F#?

问题 I've made a scanner in F#. Currently it returns a list of bunch of tuples with type (Token, string). Ideally I'd like to return a list of tuples that might contain different types. For example: (Token, string) //if it's an identifier (Token, float) //if it's a float. (Token, int) //etc So, Basically I'd like to return type (Token, _) but I'm not sure how to specify this. Right now it just has errors complaining of mismatched types. I'm looking through my book, and wikibooks, but I'm not