named-entity-recognition

Natural Language Processing Using Elasticsearch and Google Cloud Api

阅读更多关于 Natural Language Processing Using Elasticsearch and Google Cloud Api

问题 I want to use NLP with elasticsearch. I have been able to achieve one level by using Open NLP plugin as mentioned in comments of this question. I am getting entities like person , organization , location etc indexed while inserting documents. I have a doubt while searching the same information.Since, I need to process the terms entered by the user during query time. Following is what I have thought of: Process the query entered by user using apache NLP as specified here. Extract Person,

How to call the ClassifierBasedTagger() in NLTK

阅读更多关于 How to call the ClassifierBasedTagger() in NLTK

问题 I have followed in the documentation from nltk book (chapter 6 and 7) and other ideas to train my own model for named entity recognition. After building a feature function and ClassifierBasedTagger like this: class NamedEntityChunker(ChunkParserI): def __init__(self, train_sents, feature_detector=features, **kwargs): assert isinstance(train_sents, Iterable) tagged_sents = [[((w,t),c) for (w,t,c) in tree2conlltags(sent)] for sent in train_sents] #other possible option: self.feature_detector =

Difference between IOB Accuracy and Precision

阅读更多关于 Difference between IOB Accuracy and Precision

问题 I'm doing some works on NLTK with named entity recognition and chunkers. I retrained a classifier using nltk/chunk/named_entity.py for that and I got the following mesures: ChunkParse score: IOB Accuracy: 96.5% Precision: 78.0% Recall: 91.9% F-Measure: 84.4% But I don't understand what is the exact difference between IOB Accuracy and Precision in this case. Actually, I found on the docs (here) the following for an specific example: The IOB tag accuracy indicates that more than a third of the

How to define person's names in text (Java)

阅读更多关于 How to define person's names in text (Java)

问题 I have some input text, which contains one or more human person names. I do not have any dictionary for these names. Which Java library can help me to define names from my input text? I looked through OpenNLP, but did not find any example or guide or at least description of how it can be applied into my code. (I saw javadoc, but it is pretty poor documentation for such a project.) I want to find names from some random text. If the input text is "My friend Joe Smith went to the store.", then I

NLTK: why does nltk not recognize the CLASSPATH variable for stanford-ner?

阅读更多关于 NLTK: why does nltk not recognize the CLASSPATH variable for stanford-ner?

问题 This is my code from nltk.tag import StanfordNERTagger st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') And i get NLTK was unable to find stanford-ner.jar! Set the CLASSPATH environment variable. This is what my .bashrc looks like in ubuntu export CLASSPATH=/home/wolfgang/Downloads/stanford-ner-2015-04-20/stanford-ner-3.5.2.jar export STANFORD_MODELS=/home/wolfgang/Downloads/stanford-ner-2015-04-20/classifiers Also, i tried printing the environmental variable in python this way

nlp - How to detect if a word in a sentence is pointing to a color/body part /vehicle

阅读更多关于 nlp - How to detect if a word in a sentence is pointing to a color/body part /vehicle

问题 So as the title suggests I would like to know if a certain word in a sentence is pointing to 1] A color The grass is green. Hence "green" is color 2] A body part Her hands are soft Hence "hands" is a body part 3] A vehicle I am driving my car on the causeway Hence "car" is a vehicle In similar problems, parsers are one of the possible effective solutions. Stanford parser for example was suggested to a similar question How to find if a word in a sentence is pointing to a city Now the problem

Which settings should be used for TokensregexNER

阅读更多关于 Which settings should be used for TokensregexNER

问题 When I try regexner it works as expected with the following settings and data; props.setProperty("annotators", "tokenize, cleanxml, ssplit, pos, lemma, regexner"); Bachelor of Laws DEGREE Bachelor of (Arts|Laws|Science|Engineering|Divinity) DEGREE What I would like to do is that using TokenRegex. For example Bachelor of Laws DEGREE Bachelor of ([{tag:NNS}] [{tag:NNP}]) DEGREE I read that to do this, I should use TokensregexNERAnnotator. I tried to use it as follows, but it did not work.

NLTK named entity recognition in dutch

阅读更多关于 NLTK named entity recognition in dutch

问题 I am trying to extract named entities from dutch text. I used nltk-trainer to train a tagger and a chunker on the conll2002 dutch corpus. However, the parse method from the chunker is not detecting any named entities. Here is my code: str = 'Christiane heeft een lam.' tagger = nltk.data.load('taggers/dutch.pickle') chunker = nltk.data.load('chunkers/dutch.pickle') str_tags = tagger.tag(nltk.word_tokenize(str)) print str_tags str_chunks = chunker.parse(str_tags) print str_chunks And the output

Training own model and adding new entities with spacy

阅读更多关于 Training own model and adding new entities with spacy

问题 I have been trying to train a model with the same method as #887 is using, just for a test case. I have a question, what would be the best format for a training corpus to import in spacy. I have a text-file with a list of of entities that requires new entities for tagging. Let me explain my case, I follow the update.training script like this: nlp = spacy.load('en_core_web_md', entity=False, parser=False) ner= EntityRecognizer(nlp.vocab, entity_types=['FINANCE']) for itn in range(5): random

Stanford NER Features

阅读更多关于 Stanford NER Features

问题 I am currently trying to use the Stanford NER system and I am trying to see what features can be extracted through setting of the flags in a properties file. It seems that the features documented at http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ie/NERFeatureFactory.html are not comprehensive. For example, all the feature flags related to dist similarity and clustering are not included (e.g. useDistSim, etc.). Is there a more complete list of all the features and corresponding