stanford-nlp | 易学教程

Stanford NLP Tagger via NLTK - tag_sents splits everything into chars

阅读更多关于 Stanford NLP Tagger via NLTK - tag_sents splits everything into chars

问题 I'm hoping someone has experience with this as I'm unable to find any comments online besides a bug report from 2015 regarding the NERtagger which is probably the same. Anyway, I'm trying to batch process text to get around the poor performing base tagger. From what I understand, tag_sents should help. from nltk.tag.stanford import StanfordPOSTagger from nltk import word_tokenize import nltk stanford_model = 'stanford-postagger/models/english-bidirectional-distsim.tagger' stanford_jar =

How to get the root node in Stanford Parse-Tree?

阅读更多关于 How to get the root node in Stanford Parse-Tree?

I have this parse tree here: What I want is to get all words from a common parent given a word in the set of children of a subtree. For example if you take the word " bottles " then I want to get " the Voss bottles " or maybe even " the Voss bottles of water " but I don't know how to do that. Annotation document = new Annotation(sentenceText); this.pipeline.annotate(document); List<CoreMap> sentences = document.get(SentencesAnnotation.class); for (CoreMap sentence : sentences) { Tree tree = sentence.get(TreeAnnotation.class); List<Tree> leaves = new ArrayList<>(); leaves = tree.getLeaves

Stanford NLP Tagger via NLTK - tag_sents splits everything into chars

阅读更多关于 Stanford NLP Tagger via NLTK - tag_sents splits everything into chars

I'm hoping someone has experience with this as I'm unable to find any comments online besides a bug report from 2015 regarding the NERtagger which is probably the same. Anyway, I'm trying to batch process text to get around the poor performing base tagger. From what I understand, tag_sents should help. from nltk.tag.stanford import StanfordPOSTagger from nltk import word_tokenize import nltk stanford_model = 'stanford-postagger/models/english-bidirectional-distsim.tagger' stanford_jar = 'stanford-postagger/stanford-postagger.jar' tagger = StanfordPOSTagger(stanford_model, stanford_jar) tagger

Get the K best parses of a sentence with Stanford Parser

阅读更多关于 Get the K best parses of a sentence with Stanford Parser

I want to have the K best parses of a sentence, I figured that this can be done with ExhaustivePCFGParser Class , the problem is that I don't know how to use this class , more precisely haw can I instantiate this class ? ( the constructor is : ExhaustivePCFGParser(BinaryGrammar bg, UnaryGrammar ug, Lexicon lex, Options op, Index stateIndex, Index wordIndex, Index tagIndex) ) but i don't know how to fit all this parameters Is there any more easy way to have the K best parses ? Christopher Manning In general you do things via a LexicalizedParser object which is a "grammar" which provides all

Mute Stanford coreNLP logging

阅读更多关于 Mute Stanford coreNLP logging

First of all, Java is not my usual language, so I'm quite basic at it. I need to use it for this particular project, so please be patient, and if I have omitted any relevant information, please ask for it, I will be happy to provide it. I have been able to implement coreNLP, and, seemingly, have it working right, but is generating lots of messages like: ene 20, 2017 10:38:42 AM edu.stanford.nlp.process.PTBLexer next ADVERTENCIA: Untokenizable: 【 (U+3010, decimal: 12304) After some research (documentation, google, other threads here), I think (sorry, I don't know how I can tell for sure)

Date Extraction from Text

阅读更多关于 Date Extraction from Text

I am trying to use Stanford NLP tool to extract dates ( 8/11/2012 ) form text. Here's a link ! for the demo of this tool Can u help me in how to train the classifier to identify date ( 8/11/2012 ). I tried using training data as Woodhouse PERS 8/18/2012 Date , O handsome O but does not work for same test data . tysonjh Using the NLP tool to extract dates from text seems like overkill if this is all you are trying to accomplish. You should consider other options like a simple Java regular expression (eg. here ). If you are doing something that requires more features from the Stanford NLP tool,

Entities on my gazette are not recognized

阅读更多关于 Entities on my gazette are not recognized

I would like to create a custom NER model. That's what i did: TRAINING DATA (stanford-ner.tsv): Hello O ! O My O name O is O Damiano PERSON . O PROPERTIES (stanford-ner.prop): trainFile = stanford-ner.tsv serializeTo = ner-model.ser.gz map = word=0,answer=1 maxLeft=1 useClassFeature=true useWord=true useNGrams=true noMidNGrams=true maxNGramLeng=6 usePrev=true useNext=true useDisjunctive=true useSequences=true usePrevSequences=true useTypeSeqs=true useTypeSeqs2=true useTypeySequences=true wordShape=chris2useLC useGazettes=true gazette=gazzetta.txt cleanGazette=true GAZZETTE gazzetta.txt):

Spark Scala - java.util.NoSuchElementException & Data Cleaning

阅读更多关于 Spark Scala - java.util.NoSuchElementException & Data Cleaning

I have had a similar problem before , but I am looking for a generalizable answer. I am using spark-corenlp to get Sentiment scores on e-mails. Sometimes, sentiment() crashes on some input (maybe it's too long, maybe it had an unexpected character). It does not tell me it crashes on some instances, and just returns the Column sentiment('email) . Thus, when I try to show() beyond a certain point or save() my data frame, I get a java.util.NoSuchElementException because sentiment() must have returned nothing at that row. My initial code is loading the data, and applying sentiment() as shown in

Identify prepositons and individual POS

阅读更多关于 Identify prepositons and individual POS

I am trying to find correct parts of speech for each word in paragraph. I am using Stanford POS Tagger. However, I am stuck at a point. I want to identify prepositions from the paragraph. Penn Treebank Tagset says that: IN Preposition or subordinating conjunction how, can I be sure if current word is be preposition or subordinating conjunction . How can I extract only prepositions from paragraph in this case? You can't be sure. The reason for this somewhat strange PoS is that it's really hard to automatically determine if, for example, for is a preposition or a subordinate conjunction. So in

Name Extraction - CV/Resume - Stanford NER/OpenNLP

阅读更多关于 Name Extraction - CV/Resume - Stanford NER/OpenNLP

I'm currently on a learning project to extract an individuals name from their CV/Resume. Currently I'm working with Stanford-NER and OpenNLP which both perform with a degree of success out of the box on, tending to struggle on "non-western" type names (no offence intended towards anybody). My question is - given the general lack of sentence structure or context in relation to an individuals name in a CV/Resume, am I likely to gain any significant improvement in name identification by creating something akin to a CV corpus? My initial thoughts are that I'd probably have a more success by