stanford-nlp | 易学教程

Extracting triplets from a sentence in Python

阅读更多关于 Extracting triplets from a sentence in Python

来源： https://stackoverflow.com/questions/32002386/extracting-triplets-from-a-sentence-in-python

Can't start java.exe for stanfordtagger in android via python script and chaquopy

阅读更多关于 Can't start java.exe for stanfordtagger in android via python script and chaquopy

来源： https://stackoverflow.com/questions/63632812/cant-start-java-exe-for-stanfordtagger-in-android-via-python-script-and-chaquop

StanfordNLP custom model in java

阅读更多关于 StanfordNLP custom model in java

问题 I am using Stanford NLP for the first time. Here is my code as of now: Properties props = new Properties(); props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner"); props.setProperty("ner.additional.regexner.mapping", "additional.rules"); //props.setProperty("ner.applyFineGrained", "false"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String content = "request count for www.abcd.com"; CoreDocument doc = new CoreDocument(content); // annotate the document pipeline.annotate

Text tokenization with Stanford NLP : Filter unrequired words and characters

阅读更多关于 Text tokenization with Stanford NLP : Filter unrequired words and characters

问题 I use Stanford NLP for string tokenization in my classification tool. I want to get only meaningful words, but I get non-word tokens (like --- , > , . etc.) and not important words like am , is , to (stop words). Does anybody know a way to solve this problem? 回答1: This is a very domain-specific task that we don't perform for you in CoreNLP. You should be able to make this work with a regular expression filter and a stopword filter on top of the CoreNLP tokenizer. Here's an example list of

StanfordCoreNLP server listening indefinitely using Stanza

阅读更多关于 StanfordCoreNLP server listening indefinitely using Stanza

问题 I am trying to run the java Stanford CoreNLP package using a python wrapper called Stanza. I am simply trying to run the example provided on the website. I am not using any virtual environment. Here is the example I am running. Whenever I run the java server, it just hangs at listening or eventually times out. I followed the instruction provided with the example and set the CORENLP_HOME variable. I was initially running the code through PyCharm but then I also observed the same behavior when

StanfordCoreNLP server listening indefinitely using Stanza

阅读更多关于 StanfordCoreNLP server listening indefinitely using Stanza

Glove Word Embeddings supported languages

阅读更多关于 Glove Word Embeddings supported languages

问题 I started experimenting with word embeddings, and I found some results which I don't know how to interpret. I first used an English corpus for both training and testing and afterwards, I used the English corpus for training and a small French corpus for testing (all corpora have been annotated for the same binary classification task). In both cases, I used the pre-trained on tweets Glove embeddings. As the results in the case where I also used the French corpus improved (by almost 5%,

Extract Noun Phrases with Stanza and CoreNLPClient

阅读更多关于 Extract Noun Phrases with Stanza and CoreNLPClient

问题 I am trying to extract noun phrases from sentences using Stanza(with Stanford CoreNLP). This can only be done with the CoreNLPClient module in Stanza. # Import client module from stanza.server import CoreNLPClient # Construct a CoreNLPClient with some basic annotators, a memory allocation of 4GB, and port number 9001 client = CoreNLPClient(annotators=['tokenize','ssplit','pos','lemma','ner', 'parse'], memory='4G', endpoint='http://localhost:9001') Here is an example of a sentence, and I am

Extract Noun Phrases with Stanza and CoreNLPClient

阅读更多关于 Extract Noun Phrases with Stanza and CoreNLPClient

NLTK was unable to find the java file! for Stanford POS Tagger

阅读更多关于 NLTK was unable to find the java file! for Stanford POS Tagger

问题 I have been stuck trying to get the Stanford POS Tagger to work for a while. From an old SO post I found the following (slightly modified) code: stanford_dir = 'C:/Users/.../stanford-postagger-2017-06-09/' from nltk.tag import StanfordPOSTagger #from nltk.tag.stanford import StanfordPOSTagger # I tried it both ways from nltk import word_tokenize # Add the jar and model via their path (instead of setting environment variables): jar = stanford_dir + 'stanford-postagger.jar' model = stanford_dir