nlp

Preventing tokens from containing a space in Stanford CoreNLP

微笑、不失礼 提交于 2019-12-11 11:58:35
问题 Is there an option in Stanford CoreNLP's tokenizer to prevent tokens from containing a space? E.g. if the sentence is "my phone is 617 1555-6644", the substring "617 1555" should be Into two different tokens. I am aware of the option normalizeSpace: normalizeSpace: Whether any spaces in tokens (phone numbers, fractions get turned into U+00A0 (non-breaking space). It's dangerous to turn this off for most of our Stanford NLP software, which assumes no spaces in tokens. but I don't want tokens

Cast from GrammaticalStructure to Tree

我们两清 提交于 2019-12-11 11:54:10
问题 I am trying out the new NN Dependency Parser from Stanford. According to the demo they have provided, this is how the parsing is done: import edu.stanford.nlp.process.DocumentPreprocessor; import edu.stanford.nlp.trees.GrammaticalStructure; import edu.stanford.nlp.parser.nndep.DependencyParser; ... GrammaticalStructure gs = null; DocumentPreprocessor tokenizer = new DocumentPreprocessor(new StringReader(sentence)); for (List<HasWord> sent : tokenizer) { List<TaggedWord> tagged = tagger

Implementing a Top Down Parser in C#

耗尽温柔 提交于 2019-12-11 11:49:37
问题 i am Student and i want to implement a top down parser in my language translation project that is developed using c# language. for example if i need to construct a parser tree for the sentence "My Name is Husni and i am a Student" how can i do it in C# language. 回答1: After the book you can also find interesting to read about a compiler generator as ANTLR that can help you to write the compiler ( also in C# ) and browsing the AST even visually. 回答2: I highly recommend this book: Basics of

Saving python NLTK parse tree to image file [duplicate]

一个人想着一个人 提交于 2019-12-11 11:40:58
问题 This question already has answers here : Saving nltk drawn parse tree to image file (3 answers) Closed 3 years ago . This might replicate this stackoverflow question . However, i'm facing a different problem. This is my working code. import nltk from textblob import TextBlob with open('test.txt', 'rU') as ins: array = [] for line in ins: array.append(line) for i in array: wiki = TextBlob(i) a=wiki.tags sentence = a pattern = """NP: {<DT>?<JJ>*<NN>} VBD: {<VBD>} IN: {<IN>}""" NPChunker = nltk

Use RDF API (Jena, OpenRDF or Protege) to convert OpenIE outputs

醉酒当歌 提交于 2019-12-11 11:26:23
问题 I was recommended to use one of the APIs (Jena, OpenRDF or Protege) to convert the outputs that I generated from OpenIE4.1 jar file (downloadable from http://knowitall.github.io/openie/). The following is the sample OpenIE4.1 output format: confidence score followed by subject, predicate, object triplet The rail launchers are conceptually similar to the underslung SM-1 0.93 (The rail launchers; are; conceptually similar to the underslung SM-1) I planned to produce triples that follow this

NLP- Sentiment Processing for Junk Data takes time

柔情痞子 提交于 2019-12-11 11:24:15
问题 I am trying to find the Sentiment for the input text. This test is a junk sentence and when I tried to find the Sentiment the Annotation to parse the sentence is taking around 30 seconds. For normal text it takes less than a second. If i need to process around millions of data it will add up the time to process. Any solution to this. String text = "Nm n n 4 n n bkj nun4hmnun Onn njnb hm5bn nm55m nbbh n mnrrnut but n rym4n nbn 4nn65 m nun m n nn nun 4nm 5 gm n my b bb b b rtmrt55tmmm5tttn b b

How to keep punctuation in Stanford dependency parser

拜拜、爱过 提交于 2019-12-11 11:05:57
问题 I am using Stanford CoreNLP (01.2016 version) and I would like to keep the punctuation in the dependency relations. I have found some ways for doing that when you run it from command line, but I didn't find anything regarding the java code which extracts the dependency relations. Here is my current code. It works, but no punctuation is included: Annotation document = new Annotation(text); Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse

Set of rules for textual analysis - Natural language processing

巧了我就是萌 提交于 2019-12-11 10:34:06
问题 Does there exist a guide with a set of rules for textual analysis / natural language processing ? Do you have some specific developed package (e.g. in Python) for textual sentiment analysis? Here is the application I am faced with: Let's say I have two dictionaries, A and B. A contains "negative" words, and B contains "positive" words. What I can do is count the negative and the positive number of words. This created some issues, such as the following: let's suppose that " exceptionally " is

synonyms offline Dictionary for a search application

主宰稳场 提交于 2019-12-11 10:26:24
问题 i'm trying to build a smart search engine application that gets synonyms of the words in the Question and Query my database with each of the generated synonyms the problem is that i'm searching for a way to get all synonyms of the words in the Question using a dictionary or something. that could in the end offers 1- direct synonyms like : file > movie , football > soccer 2- could offer a matchstring like : population size > number of citizens (optional ) 3- something that is fast and reliable

how to identify a end of a sentence

跟風遠走 提交于 2019-12-11 10:19:53
问题 String x=" i am going to the party at 6.00 in the evening. are you coming with me?"; if i have the above string, i need that to be broken to sentences by using sentence boundry punctuations(like . and ?) but it should not split the sentence at 6 because of having an pointer there. is there a way to identify what is the correct sentence boundry place in java? i have tried using stringTokenizer in java.util pakage but it always break the sentence whenever it finds a pointer. Can someone suggest