grammar

How do I get a set of grammar rules from Penn Treebank using python & NLTK?

半腔热情 提交于 2019-12-20 08:49:55
问题 I'm fairly new to NLTK and Python. I've been creating sentence parses using the toy grammars given in the examples but I would like to know if it's possible to use a grammar learned from a portion of the Penn Treebank, say, as opposed to just writing my own or using the toy grammars? (I'm using Python 2.7 on Mac) Many thanks 回答1: If you want a grammar that precisely captures the Penn Treebank sample that comes with NLTK, you can do this, assuming you've downloaded the Treebank data for NLTK

How to define a grammar for a programming language

女生的网名这么多〃 提交于 2019-12-20 08:41:24
问题 How to define a grammar (context-free) for a new programming language (imperative programming language) that you want to design from scratch. In other words: How do you proceed when you want to create a new programming language from scratch. 回答1: One step at a time. No seriously, start with expressions and operators, work upwards to statements, then to functions/classes etc. Keep a list of what punctuation is used for what. In parallel define syntax for referring to variables, arrays, hashes,

Antlr3: Could not match token in parser rules which is used in lexer rule

旧城冷巷雨未停 提交于 2019-12-20 07:27:58
问题 I have lexer rules in Antlr3 as: HYPHEN : '-'; TOKEN : HYPHEN CHARS; CHARS : 'a' ..'z'; Parser rule is as: exp : CHARS | some complex expression; parser_rule : exp HYPHEN exp; If I try to match 'abc-abc' with parser_rule, It fails. Because lexer creates TOKEN for HYPHEN exp. How can I match it correctly with parser_rule. 回答1: In ANTLR lexer, the lexer rule that can match the longest sub-sequence of input is used. So your input abc-abc will be tokenized as CHARS("abc") TOKEN("-abc") and

Why does not ANTLR4 match “of” as a word and “,” as punctuation?

自闭症网瘾萝莉.ら 提交于 2019-12-20 06:38:44
问题 I have a Hello.g4 grammar file with a grammar definition: definition : wordsWithPunctuation ; words : (WORD)+ ; wordsWithPunctuation : word ( word | punctuation word | word punctuation | '(' wordsWithPunctuation ')' | '"' wordsWithPunctuation '"' )* ; NUMBER : [0-9]+ ; word : WORD ; WORD : [A-Za-z-]+ ; punctuation : PUNCTUATION ; PUNCTUATION : (','|'!'|'?'|'\''|':'|'.') ; WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines Now, if I am trying to build a parse tree from the following

Left recursion elimination

送分小仙女□ 提交于 2019-12-20 04:13:18
问题 I'm attempting to eliminate left recursion from a CFG by eliminating indirect recursion then direct recursion as this algorithm shows. I'll be using this grammar: A = A a | A B C | B C | D D When i = 1 , and j = 1 we are looking at replacing all productions of the form A -> A r with: A -> δ 1 γ | δ 2 γ | .. | δ k γ So when I look at A -> A a which matches, i should replace it with A -> A a a | A B C a a | B C a | D D a which im sure is wrong Can anyone point me in the right direction for how

ANTLR on a noisy data stream

对着背影说爱祢 提交于 2019-12-20 03:35:13
问题 I'm very new in the ANTLR world and I'm trying to figure out how can I use this parsing tool to interpret a set of "noisy" string. What I would like to achieve is the following. let's take for example this phrase : It's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV What I would like to extract is CAT , SLEEPING and SOFA and have a grammar that match easily the following pattern : SUBJECT - VERB - INDIRECT OBJECT... where I could define VERB : 'SLEEPING' |

Combine free-form dictation and semantic in a srgs grammar

让人想犯罪 __ 提交于 2019-12-19 11:53:08
问题 I'm trying to combine both the result of a semantic and a dictation request in the semantic value of a SRGS document. For example, I would say "Search potato" and the output would be something like out="Search Potato" where Potato is a random word spoken by the user. I tought about using the garbage special rule, but it doesn't seem to work. So far that's what I have : <rule id="rule1" scope="public"> <one-of> <item xml:lang="en-us">Search</item> <item>Cherche</item> </one-of> <tag>out

Parsing numbers with multiple digits in Prolog

被刻印的时光 ゝ 提交于 2019-12-19 05:29:45
问题 I have the following simple expression parser: expr(+(T,E))-->term(T),"+",expr(E). expr(T)-->term(T). term(*(F,T))-->factor(F),"*",term(T). term(F)-->factor(F). factor(N)-->nat(N). factor(E)-->"(",expr(E),")". nat(0)-->"0". nat(1)-->"1". nat(2)-->"2". nat(3)-->"3". nat(4)-->"4". nat(5)-->"5". nat(6)-->"6". nat(7)-->"7". nat(8)-->"8". nat(9)-->"9". However this only supports 1-digit numbers. How can I parse numbers with multiple digits in this case? 回答1: Use accumulator variables, and pass

Is sizeof(int()) a legal expression?

限于喜欢 提交于 2019-12-19 05:06:43
问题 This question is inspired by Is sizeof(void()) a legal expression? but with an important difference as explained below. The expression in question is: sizeof( int() ) In the C++ grammar there appears: unary-expression: sizeof unary-expression sizeof ( type-id ) however, ( int() ) can match both of these cases with different meanings: As a unary-expression , it is a value-initialized int prvalue, surrounded in redundant parentheses As a type-id , it is the type of a function with no parameters

Parsing a possibly nested braced item using a grammar

纵饮孤独 提交于 2019-12-19 03:37:24
问题 I am starting to write BibTeX parser. The first thing I would like to do is to parse a braced item. A braced item could be an author field or a title for example. There might be nested braces within the field. The following code does not handle nested braces: use v6; my $str = q:to/END/; author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.}, END $str .= chomp; grammar ExtractBraced { rule TOP { 'author=' <braced-item> .* } rule braced-item { '{' <-[}]>* '}' } } ExtractBraced.parse(