stanford-nlp

Stanford NER Tagger in NLTK

我与影子孤独终老i 提交于 2019-12-01 10:57:02
问题 I am trying to import the Stanford Named Entity Recognizer in Python. This is already built in the NLTK package. However, my code below is not working: from nltk.tag.stanford import NERTagger Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name NERTagger What could be the cause? In all articles I read it works by default. Thank you. 回答1: That class has been renamed to StanfordNERTagger in version 3.0.3 (commit 190673c7). So for nltk >= 3.0.3

Stanford CoreNLP gives NullPointerException

空扰寡人 提交于 2019-12-01 10:39:28
问题 I'm trying to get my head around the Stanford CoreNLP API. I wish to get a simple sentence to be tokenized using following code: Properties props = new Properties(); props.put("annotators", "tokenize"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); // read some text in the text variable String text = "I wish this code would run."; // create an empty Annotation just with the given text Annotation document = new Annotation(text); // run all Annotators on this text pipeline.annotate

How to work around 100K character limit for the StanfordNLP server?

折月煮酒 提交于 2019-12-01 09:34:10
I am trying to parse book-length blocks of text with StanfordNLP. The http requests work great, but there is a non-configurable 100KB limit to the text length, MAX_CHAR_LENGTH in StanfordCoreNLPServer.java. For now, I am chopping up the text before I send it to the server, but even if I try to split between sentences and paragraphs, there is some useful coreference information that gets lost between these chunks. Presumably, I could parse chunks with large overlap and link them together, but that seems (1) inelegant and (2) like quite a bit of maintenance. Is there a better way to configure

Stanford NLP named entities of more than one token

我与影子孤独终老i 提交于 2019-12-01 09:29:13
I'm experimenting with Stanford Core NLP for named entity recognition. Some named entities consist of more than one token, for example, Person: "Bill Smith". I can't figure out what API calls to use to determine when "Bill" and "Smith" should be considered a single entity, and when they should be two different entities. Is there some decent documentation somewhere which explains this? Here's my current code: InputStream is = getClass().getResourceAsStream(MODEL_NAME); if (MODEL_NAME.endsWith(".gz")) { is = new GZIPInputStream(is); } is = new BufferedInputStream(is); Properties props = new

Convert words into their noun / adjective / verb form in Java

℡╲_俬逩灬. 提交于 2019-12-01 09:14:14
Is it possible to hava a Java alternative to NLTK in order to 'verbify' words as can be seen in this question? Convert words between verb/noun/adjective forms For example I would like to convert born to birth, since when using Wordnet Similarity, the algorithm does not show that born and birth are very similar. I would like to therefore convert either born to birth or vice versa. In order to have much more similar words. What do you suggest? I found some tools but I'm not sure if they can do this: - NTLK (only python I guess) - OpenNlp - Stanford-Nlp - Simple NLG Thank you A quick and dirty

Cannot Initialize CoreNLP in R

北战南征 提交于 2019-12-01 08:33:09
问题 I am unable to access coreNLP in R on a Mac running High Sierra. I am uncertain what the problem is, but it seems that every time I try again to get coreNLP to work, I am faced with a different error. I have JDK 9.0.4. Please see my code below for what I am attempting to do, and the error that stops me. My previous attempt I was able to get initCoreNLP() to run and load some elements of the packages, but it would fail on others. When I then attempted to run annotateString() , it would throw

Using Stanford Tregex in Python

懵懂的女人 提交于 2019-12-01 06:18:36
I'm a newbie in NLP and Python. I'm trying to extract a subset of noun phrases from parsed trees from StanfordCoreNLP by using the Tregex tool and the Python subprocess library. In particular, I'm trying to find and extract noun phrases that match the following pattern: '(NP[$VP]>S)|(NP[$VP]>S\n)|(NP\n[$VP]>S)|(NP\n[$VP]>S\n)' in the Tregex grammar. For example, below is the original text, saved in a string named "text": text = ('Pusheen and Smitha walked along the beach. "I want to surf", said Smitha, the CEO of Tesla. However, she fell off the surfboard') After running the StanfordCoreNLP

Using Stanford Tregex in Python

大城市里の小女人 提交于 2019-12-01 04:56:24
问题 I'm a newbie in NLP and Python. I'm trying to extract a subset of noun phrases from parsed trees from StanfordCoreNLP by using the Tregex tool and the Python subprocess library. In particular, I'm trying to find and extract noun phrases that match the following pattern: '(NP[$VP]>S)|(NP[$VP]>S\n)|(NP\n[$VP]>S)|(NP\n[$VP]>S\n)' in the Tregex grammar. For example, below is the original text, saved in a string named "text": text = ('Pusheen and Smitha walked along the beach. "I want to surf",

Stanford parser java error

冷暖自知 提交于 2019-12-01 03:24:29
问题 I am working on a research about NLP, i woul to use Stanford parser to extract noun phrases from text, the parser version i used is 3.4.1 this is the sample code i used package stanfordparser; import java.util.Collection; import java.util.List; import java.io.StringReader; import edu.stanford.nlp.process.Tokenizer; import edu.stanford.nlp.process.TokenizerFactory; import edu.stanford.nlp.process.CoreLabelTokenFactory; import edu.stanford.nlp.process.DocumentPreprocessor; import edu.stanford

NER model to recognize Indian names

独自空忆成欢 提交于 2019-12-01 01:16:28
I am planning to use Named Entity Recognition (NER) technique to identify person names (most of which are Indian names) from a given text. I have already explored the CRF-based NER model from Stanford NLP, however it is not quite accurate in recognizing Indian names. Hence I decided to create my own custom NER model via supervised training. I have a fair idea of how to create own NER model using the Stanford NER CRF, but creating a large training corpus with manual annotation is something I would like to avoid, as it is a humongous effort for an individual and secondly obtaining diverse people