stanford-nlp

StanfordNLP custom model in java

↘锁芯ラ 提交于 2020-07-22 06:12:29
问题 I am using Stanford NLP for the first time. Here is my code as of now: Properties props = new Properties(); props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner"); props.setProperty("ner.additional.regexner.mapping", "additional.rules"); //props.setProperty("ner.applyFineGrained", "false"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String content = "request count for www.abcd.com"; CoreDocument doc = new CoreDocument(content); // annotate the document pipeline.annotate

Text tokenization with Stanford NLP : Filter unrequired words and characters

痴心易碎 提交于 2020-07-18 11:23:12
问题 I use Stanford NLP for string tokenization in my classification tool. I want to get only meaningful words, but I get non-word tokens (like --- , > , . etc.) and not important words like am , is , to (stop words). Does anybody know a way to solve this problem? 回答1: This is a very domain-specific task that we don't perform for you in CoreNLP. You should be able to make this work with a regular expression filter and a stopword filter on top of the CoreNLP tokenizer. Here's an example list of

StanfordCoreNLP server listening indefinitely using Stanza

雨燕双飞 提交于 2020-07-16 04:19:50
问题 I am trying to run the java Stanford CoreNLP package using a python wrapper called Stanza. I am simply trying to run the example provided on the website. I am not using any virtual environment. Here is the example I am running. Whenever I run the java server, it just hangs at listening or eventually times out. I followed the instruction provided with the example and set the CORENLP_HOME variable. I was initially running the code through PyCharm but then I also observed the same behavior when

StanfordCoreNLP server listening indefinitely using Stanza

淺唱寂寞╮ 提交于 2020-07-16 04:19:03
问题 I am trying to run the java Stanford CoreNLP package using a python wrapper called Stanza. I am simply trying to run the example provided on the website. I am not using any virtual environment. Here is the example I am running. Whenever I run the java server, it just hangs at listening or eventually times out. I followed the instruction provided with the example and set the CORENLP_HOME variable. I was initially running the code through PyCharm but then I also observed the same behavior when

Glove Word Embeddings supported languages

梦想与她 提交于 2020-06-26 13:44:26
问题 I started experimenting with word embeddings, and I found some results which I don't know how to interpret. I first used an English corpus for both training and testing and afterwards, I used the English corpus for training and a small French corpus for testing (all corpora have been annotated for the same binary classification task). In both cases, I used the pre-trained on tweets Glove embeddings. As the results in the case where I also used the French corpus improved (by almost 5%,

Extract Noun Phrases with Stanza and CoreNLPClient

吃可爱长大的小学妹 提交于 2020-06-17 13:29:27
问题 I am trying to extract noun phrases from sentences using Stanza(with Stanford CoreNLP). This can only be done with the CoreNLPClient module in Stanza. # Import client module from stanza.server import CoreNLPClient # Construct a CoreNLPClient with some basic annotators, a memory allocation of 4GB, and port number 9001 client = CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse'], memory='4G', endpoint='http://localhost:9001') Here is an example of a sentence, and I am

Extract Noun Phrases with Stanza and CoreNLPClient

|▌冷眼眸甩不掉的悲伤 提交于 2020-06-17 13:27:25
问题 I am trying to extract noun phrases from sentences using Stanza(with Stanford CoreNLP). This can only be done with the CoreNLPClient module in Stanza. # Import client module from stanza.server import CoreNLPClient # Construct a CoreNLPClient with some basic annotators, a memory allocation of 4GB, and port number 9001 client = CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse'], memory='4G', endpoint='http://localhost:9001') Here is an example of a sentence, and I am

NLTK was unable to find the java file! for Stanford POS Tagger

徘徊边缘 提交于 2020-05-26 05:06:19
问题 I have been stuck trying to get the Stanford POS Tagger to work for a while. From an old SO post I found the following (slightly modified) code: stanford_dir = 'C:/Users/.../stanford-postagger-2017-06-09/' from nltk.tag import StanfordPOSTagger #from nltk.tag.stanford import StanfordPOSTagger # I tried it both ways from nltk import word_tokenize # Add the jar and model via their path (instead of setting environment variables): jar = stanford_dir + 'stanford-postagger.jar' model = stanford_dir