context-free-grammar

If we know a CFG only generates regular language, can we get the corresponding regular expression?

↘锁芯ラ 提交于 2019-12-04 02:50:16
As we know, given a regular grammar, we have algorithm to get its regular expression. But if the given grammar is context-free grammar (but it only generates regular language), like S->aAb A->bB B->cB|d Is there any existing algorithm that can get the regular expression in general? Thanks! In the most general sense, there is no solution. The problem of determining whether a CFG is regular is undecidable (Greibach Theorem, last 3 pages of http://www.cis.upenn.edu/~jean/gbooks/PCPh04.pdf ) If we could convert CFGs to Regular Expressions, we could use that algorithm on any grammar and use its

Why do on-line parsers seem to stop at regexps?

落爺英雄遲暮 提交于 2019-12-03 14:36:26
I've been wondering for long why there doesn't seem to be any parsers for, say, BNF , that behave like regexps in various libraries. Sure, there's things like ANTLR , Yacc and many others that generate code which, in turn, can parse a CFG , but there doesn't seem to be a library that can do that without the intermediate step. I'm interested in writing a Packrat parser , to boot all those nested-parenthesis-quirks associated with regexps (and, perhaps even more so, for the sport of it), but somehow I have this feeling that I'm just walking into another halting problem -like class of swamps. Is

Context free grammar for non-palindrome

。_饼干妹妹 提交于 2019-12-03 14:14:05
I need a CFG which will generate strings other than palindromes. The solution has been provided and is as below.(Introduction to theory of computation - Sipser) R -> XRX | S S -> aTb | bTa T -> XTX | X | <epsilon> X -> a | b I get the general idea of how this grammar works. It mandates the insertion of a sub-string which has corresponding non-equal alphabets on its either half, through the production S -> aTb | bTa , thus ensuring that a palindrome could never be generated. I will write down the semantics of the first two productions as I have understood it, S generates strings which cannot be

Horizontal Markovization

▼魔方 西西 提交于 2019-12-03 14:10:24
I have to implement horizontal markovization (NLP concept) and I'm having a little trouble understanding what the trees will look like. I've been reading the Klein and Manning paper , but they don't explain what the trees with horizontal markovization of order 2 or order 3 will look like. Could someone shed some light on the algorithm and what the trees are SUPPOSED to look like? I'm relatively new to NLP. So, let's say you have a bunch of flat rules like: NP NNP NNP NNP NNP or VP V Det NP When you binarize these you want to keep the context (i.e. this isn't just a Det but specifically a Det

Why is bottom-up parsing more common than top-down parsing?

血红的双手。 提交于 2019-12-03 09:33:22
It seems that recursive-descent parsers are not only the simplest to explain, but also the simplest to design and maintain. They aren't limited to LALR(1) grammars, and the code itself can be understood by mere mortals. In contrast, bottom up parsers have limits on the grammars they are able to recognize, and need to be generated by special tools (because the tables that drive them are next-to-impossible to generate by hand). Why then, is bottom-up (i.e. shift-reduce) parsing more common than top-down (i.e. recursive descent) parsing? Ira Baxter If you choose a powerful parser generator, you

Context-free grammar for C

China☆狼群 提交于 2019-12-03 08:46:36
问题 I'm working on a parser for C. I'm trying to find a list of all of the context-free derivations for C. Ideally it would be in BNF or similar. I'm sure such a thing is out there, but googling around hasn't given me much. Reading the source code for existing parsers/compilers has proven to be far more confusing than helpful, as most that I've found are much more ambitious and complicated than the one I'm building. 回答1: You could always use Annex A of the C11 standard itself. The freely

Algorithm to generate context free grammar from any regex

让人想犯罪 __ 提交于 2019-12-03 08:18:19
Can anyone outline for me an algorithm that can convert any given regex into an equivalent set of CFG rules? I know how to tackle the elementary stuff such as (a|b)*: S -> a A S -> a B S -> b A S -> b B A -> a A A -> a B A -> epsilon B -> b A B -> b B B -> epsilon S -> epsilon (end of string) However, I'm having some problem formalizing it into a proper algorithm especially with more complex expressions that can have many nested operations. If you are just talking about regular expressions from a theoretical point of view, there are these three constructs: ab # concatenation a|b # alternation

Using Parsec to parse regular expressions

浪子不回头ぞ 提交于 2019-12-03 07:45:19
I'm trying to learn Parsec by implementing a small regular expression parser. In BNF, my grammar looks something like: EXP : EXP * | LIT EXP | LIT I've tried to implement this in Haskell as: expr = try star <|> try litE <|> lit litE = do c <- noneOf "*" rest <- expr return (c : rest) lit = do c <- noneOf "*" return [c] star = do content <- expr char '*' return (content ++ "*") There are some infinite loops here though (e.g. expr -> star -> expr without consuming any tokens) which makes the parser loop forever. I'm not really sure how to fix it though, because the very nature of star is that it

chomsky hierarchy and programming languages

最后都变了- 提交于 2019-12-03 07:14:54
问题 I'm trying to learn some aspects of the Chomsky Hierarchy which are related to programming languages, and i still have to read the Dragon Book. I've read that most programming languages can be parsed as a context free grammar (CFG). In term of computational power, it equals the one of a pushdown non deterministic automaton. Am I right? If it's true, then how could a CFG hold an unrestricted grammar (UG), which is turing complete? I'm asking because, even if programming languages are described

How do Java, C++, C#, etc. get around this particular syntactic ambiguity with < and >?

老子叫甜甜 提交于 2019-12-03 03:10:12
问题 I used to think C++ was the "weird" one with all the ambiguities with < and > , but after trying to implement a parser I think I found an example which breaks just about every language that uses < and > for generic types: f(g<h, i>(j)); This could be syntactically either interpreted as a generic method call ( g ), or it could be interpreted as giving f the results of two comparisons. How do such languages (especially Java, which I thought was supposed to be LALR(1)-parsable?) get around this