grammar

chomsky hierarchy in plain english

China☆狼群 提交于 2019-12-02 17:30:35
I'm trying to find a plain (i.e. non-formal) explanation of the 4 levels of formal grammars (unrestricted, context-sensitive, context-free, regular) as set out by Chomsky. It's been an age since I studied formal grammars, and the various definitions are now confusing for me to visualize. To be clear, I'm not looking for the formal definitions you'll find everywhere (e.g. here and here -- I can google as well as anyone else), or really even formal definitions of any sort. Instead, what I was hoping to find was clean and simple explanations that don't sacrifice clarity for the sake of

How do I get a set of grammar rules from Penn Treebank using python & NLTK?

不打扰是莪最后的温柔 提交于 2019-12-02 17:18:50
I'm fairly new to NLTK and Python. I've been creating sentence parses using the toy grammars given in the examples but I would like to know if it's possible to use a grammar learned from a portion of the Penn Treebank, say, as opposed to just writing my own or using the toy grammars? (I'm using Python 2.7 on Mac) Many thanks If you want a grammar that precisely captures the Penn Treebank sample that comes with NLTK, you can do this, assuming you've downloaded the Treebank data for NLTK (see comment below): import nltk from nltk.corpus import treebank from nltk.grammar import ContextFreeGrammar

How to define a grammar for a programming language

隐身守侯 提交于 2019-12-02 17:09:48
How to define a grammar (context-free) for a new programming language (imperative programming language) that you want to design from scratch. In other words: How do you proceed when you want to create a new programming language from scratch. One step at a time. No seriously, start with expressions and operators, work upwards to statements, then to functions/classes etc. Keep a list of what punctuation is used for what. In parallel define syntax for referring to variables, arrays, hashes, number literals, string literals, other builtin literal. Also in parallel define your data naming model and

Combining a Tokenizer into a Grammar and Parser with NLTK

你。 提交于 2019-12-02 16:41:05
I am making my way through the NLTK book and I can't seem to do something that would appear to be a natural first step for building a decent grammar. My goal is to build a grammar for a particular text corpus. (Initial question: Should I even try to start a grammar from scratch or should I start with a predefined grammar? If I should start with another grammar, which is a good one to start with for English?) Suppose I have the following simple grammar: simple_grammar = nltk.parse_cfg(""" S -> NP VP PP -> P NP NP -> Det N | Det N PP VP -> V NP | VP PP Det -> 'a' | 'A' N -> 'car' | 'door' V ->

How do Java, C++, C#, etc. get around this particular syntactic ambiguity with < and >?

末鹿安然 提交于 2019-12-02 16:39:04
I used to think C++ was the "weird" one with all the ambiguities with < and > , but after trying to implement a parser I think I found an example which breaks just about every language that uses < and > for generic types: f(g<h, i>(j)); This could be syntactically either interpreted as a generic method call ( g ), or it could be interpreted as giving f the results of two comparisons. How do such languages (especially Java, which I thought was supposed to be LALR(1)-parsable? ) get around this syntactic ambiguity? I just can't imagine any non-hacky/context-free way of dealing with this, and I'm

Looking for a Complete Delphi (object pascal) syntax

痴心易碎 提交于 2019-12-02 15:50:59
I need a complete Object Pascal syntax (preferably Delphi 2009). Some of the syntax is given by the help files, but not all information is provided. So I started collecting loose bits of information. Recently I added these to a more or less complete syntax description (EBNF like). Although it looks extensive, there are still bugs and I'm sure parts are missing (specially in the .NET syntax). So I'm asking the SO Delphi community. Do you have any information or can you correct the errors? In return I provide the complete syntax to the community. It probably saves you some time ;-). In the

How to implement JavaScript automatic semicolon insertion in JavaCC?

折月煮酒 提交于 2019-12-02 15:34:27
问题 I am finishing my ECMAScript 5.1/JavaScript grammar for JavaCC. I've done all the tokens and productions according to the specification. Now I'm facing a big question which I don't know how to solve. JavaScript has this nice feature of the automatic semicolon insertion: What are the rules for JavaScript's automatic semicolon insertion (ASI)? To quote the specifications, the rules are: There are three basic rules of semicolon insertion: When, as the program is parsed from left to right, a

Separate word lists for nouns, verbs, adjectives, etc

本秂侑毒 提交于 2019-12-02 15:13:00
Usually word lists are 1 file that contains everything, but are there separately downloadable noun list, verb list, adjective list, etc? I need them for English specifically. See Kevin's word lists . Particularly the "Part Of Speech Database." You'll have to do some minimal text-processing on your own, in order to get the database into multiple files for yourself, but that can be done very easily with a few grep commands. The license terms are available on the "readme" page. Chilly If you download just the database files from wordnet.princeton.edu/download/current-version you can extract the

Haskell - How to best to represent a programming language's grammar?

给你一囗甜甜゛ 提交于 2019-12-02 14:56:23
I've been looking at Haskell and I'd quite like to write a compiler (as a learning exercise) in it, since a lot of it's innate features can be readily applied to a compiler (particularly a recursive decent compiler). What I can't quite get my head around is how to represent a language's grammar in a Haskell-ian way. My first thought was to use recursive data type definitions, but I can't see how I use them to match against keywords in the language ("if") for example. Thoughts and suggestions greatly appreciated, Pete A recursive data type is fine for this. For example, given the language: expr

Is D's grammar really context-free?

微笑、不失礼 提交于 2019-12-02 14:21:38
I've posted this on the D newsgroup some months ago, but for some reason, the answer never really convinced me, so I thought I'd ask it here. The grammar of D is apparently context-free . The grammar of C++, however, isn't (even without macros). ( Please read this carefully! ) Now granted, I know nothing (officially) about compilers, lexers, and parsers. All I know is from what I've learned on the web. And here is what (I believe) I have understood regarding context, in not-so-technical lingo: The grammar of a language is context-free if and only if you can always understand the meaning