lexical-analysis | 易学教程

Removing nested comments bz lex

阅读更多关于 Removing nested comments bz lex

问题 How should I do program in lex (or flex) for removing nested comments from text and print just the text which is not in comments? I should probably somehow recognize states when I am in comment and number of starting "tags" of block comment. Lets have rules: 1.block comment /* block comment */ 2. line comment // line comment 3. Comments can be nested. Example 1 show /* comment /* comment */ comment */ show output: show show Example 2 show /* // comment comment */ show output: show show

Matching multiple regex groups and removing them

阅读更多关于 Matching multiple regex groups and removing them

问题 I have been given a file that I would like to extract the useful data from. The format of the file goes something like this: LINE: 1 TOKENKIND: somedata TOKENKIND: somedata LINE: 2 TOKENKIND: somedata LINE: 3 etc... What I would like to do is remove LINE: and the line number as well as TOKENKIND: so I am just left with a string that consists of 'somedata somedate somedata...' I'm using Python to do this, using regular expressions (that I'm not sure are correct) to match the bits of the file I

Analyzer to autocomplete names

阅读更多关于 Analyzer to autocomplete names

问题 I want to be able autocomplete names. For example, if we have the name John Smith , I want to be able to search for Jo and Sm and John Sm to get the document back. In addition, I do not want jo sm matching the document. I currently have this analyzer: return array( 'settings' => array( 'index' => array( 'analysis' => array( 'analyzer' => array( 'autocomplete' => array( 'tokenizer' => 'autocompleteEngram', 'filter' => array('lowercase', 'whitespace') ) ), 'tokenizer' => array(

Bison does not appear to recognize C string literals appropriately

阅读更多关于 Bison does not appear to recognize C string literals appropriately

My problem is that I am trying to run a problem that I coded using a flex-bison scanner-parser. What my program is supposed to do is take user input (in my case, queries for a database system I'm designing), lex and parse, and then execute the corresponding actions. What actually happens is that my parser code is not correctly interpreting the string literals that I feed it. Here's my code: 130 insertexpr : "INSERT" expr '(' expr ')' 131 132 { 133 $$ = new QLInsert( $2, $4 ); 134 } 135 ; And my input, following the "Query: " prompt: Query: INSERT abc(5); input:1.0-5: syntax error, unexpected

What is the meaning of yytext[0]?

阅读更多关于 What is the meaning of yytext[0]?

问题 What is the meaning of yytext[0]? And why should we use in the lex and yacc program? I'm learner so don't mind if it is a silly question. 回答1: yytext holds the text matched by the current token. So yytext[0] holds the first character of the text matched by the current token. Sometimes you have a rule which can match different texts so you need to get the real text matched like for variable names or you have a rule to match all arithmetic operations. A good source is the Flex manual. For

Is it possible to call C# lexical/syntactic analyzers without compilation?

阅读更多关于 Is it possible to call C# lexical/syntactic analyzers without compilation?

问题 Considering this question of SO, where whole C# in-memory compiler is being called. When only lexical and syntactic analyzing is required: parse text as a stream of lexemes, check them and exit. Is it possible in current version of System.CodeDom.Compiler, if not - will it be? 回答1: If you can use Mono, I believe it has a C# parser/lexer you may be able to use. Here's a link to look into. As for what the MS C# team is planning to do, there is some talk of at some point making the C# compiler

Algorithms for Natural Language Understanding

阅读更多关于 Algorithms for Natural Language Understanding

I wanted to know what algorithms I could use for NLU? For example, let's say I want to start a program, and I have these sentences "Let us start" "Let him start" Obviously, the first sentence should start the program, but not the second one (since it doesn't make sense). Right now, I have am using Stanford's NLP API and have implemented the TokenRegexAnnotator class: CoreMapExpressionExtractor<MatchedExpression> extractor = CoreMapExpressionExtractor.createExtractorFromFile(env, "tr.txt"); So my code "knows" what "Start" should do, that is, "Start" should trigger/start the program. But "Start"

How do I lex this input?

阅读更多关于 How do I lex this input?

I currently have a working, simple language implemented in Java using ANTLR. What I want to do is embed it in plain text, in a similar fashion to PHP. For example: Lorem ipsum dolor sit amet <% print('consectetur adipiscing elit'); %> Phasellus volutpat dignissim sapien. I anticipate that the resulting token stream would look something like: CDATA OPEN PRINT OPAREN APOS STRING APOS CPAREN SEMI CLOSE CDATA How can I achieve this, or is there a better way? There is no restriction on what might be outside the <% block. I assumed something like <% print('%>'); %> , as per Michael Mrozek's answer,

How do I write a parser in C or Objective-C without a parser generator?

阅读更多关于 How do I write a parser in C or Objective-C without a parser generator?

问题 I am trying to make a calculator in C or Objective-C that accepts a string along the lines of 8/2+4(3*9)^2 and returns the answer 2920. I would prefer not to use a generator like Lex or Yacc, so I want to code it from the ground up. How should I go about doing this? Other than the Dragon book, are there any recommended texts that cover this subject matter? 回答1: Try this: http://en.wikipedia.org/wiki/Shunting-yard_algorithm 回答2: Dave DeLong's DDMathParser class may save you a lot of time and

Matching multiple regex groups and removing them

阅读更多关于 Matching multiple regex groups and removing them

I have been given a file that I would like to extract the useful data from. The format of the file goes something like this: LINE: 1 TOKENKIND: somedata TOKENKIND: somedata LINE: 2 TOKENKIND: somedata LINE: 3 etc... What I would like to do is remove LINE: and the line number as well as TOKENKIND: so I am just left with a string that consists of 'somedata somedate somedata...' I'm using Python to do this, using regular expressions (that I'm not sure are correct) to match the bits of the file I'd like removing. My question is, how can I get Python to match multiple regex groups and ignore them,