stanford-nlp

Sentence compression using NLP [closed]

天涯浪子 提交于 2019-12-03 05:54:52
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . Using Machine translation, can I obtain a very compressed version of a sentence, eg. I would really like to have a delicious tasty cup of coffee would be translated to I want coffee Does any of the NLP engines provide such a functionality? I got a few research papers that does paraphase generation and sentence

How do I use python interface of Stanford NER(named entity recogniser)?

∥☆過路亽.° 提交于 2019-12-03 05:08:49
问题 I want to use Stanford NER in python using pyner library. Here is one basic code snippet. import ner tagger = ner.HttpNER(host='localhost', port=80) tagger.get_entities("University of California is located in California, United States") When I run this on my local python console(IDLE). It should have given me an output like this {'LOCATION': ['California', 'United States'], 'ORGANIZATION': ['University of California']} but when I execut this, it showed empty brackets. I am actually new to all

Python NLTK code snippet to train a classifier (naive bayes) using feature frequency

倾然丶 夕夏残阳落幕 提交于 2019-12-03 05:04:01
I was wondering if anyone could help me through a code snippet that demonstrates how to train Naive Bayes classifier using a feature frequency method as opposed to feature presence. I presume the below as shown in Chap 6 link text refers to creating a featureset using Feature Presence (FP) - def document_features(document): document_words = set(document) features = {} for word in word_features: features['contains(%s)' % word] = (word in document_words) return features Please advice In the link you sent it says this function is feature extractor that simply checks whether each of these words is

Multi-term named entities in Stanford Named Entity Recognizer

天涯浪子 提交于 2019-12-03 04:21:08
问题 I'm using the Stanford Named Entity Recognizer http://nlp.stanford.edu/software/CRF-NER.shtml and it's working fine. This is List<List<CoreLabel>> out = classifier.classify(text); for (List<CoreLabel> sentence : out) { for (CoreLabel word : sentence) { if (!StringUtils.equals(word.get(AnswerAnnotation.class), "O")) { namedEntities.add(word.word().trim()); } } } However the problem I'm finding is identifying names and surnames. If the recognizer encounters "Joe Smith", it is returning "Joe"

How do I manipulate parse trees?

人走茶凉 提交于 2019-12-03 03:54:47
问题 I've been playing around with natural language parse trees and manipulating them in various ways. I've been using Stanford's Tregex and Tsurgeon tools but the code is a mess and doesn't fit in well with my mostly Python environment (those tools are Java and aren't ideal for tweaking). I'd like to have a toolset that would allow for easy hacking when I need more functionality. Are there any other tools that are well suited for doing pattern matching on trees and then manipulation of those

Determining whether a word is a noun or not

▼魔方 西西 提交于 2019-12-03 03:21:29
Given an input word, I want to determine whether it is a noun or not (in case of ambiguity, for instance cook can be a noun or a verb, the word must be identified as a noun). Actually I use the POS tagger from the Stanford Parser (i give it a single word as input, and i extract only the POS tag from the result). The results are quite good but it takes a very long time. Is there a way (in python, please :) to perform this task quicker than what I do actually? If you simply want to check whether or not a single word can be used as a noun, the quickest way might be to build a set of all nouns and

How to Train GloVe algorithm on my own corpus

浪子不回头ぞ 提交于 2019-12-03 02:54:49
I tried to follow this. But some how I wasted a lot of time ending up with nothing useful. I just want to train a GloVe model on my own corpus (~900Mb corpus.txt file). I downloaded the files provided in the link above and compiled it using cygwin (after editing the demo.sh file and changed it to VOCAB_FILE=corpus.txt . should I leave CORPUS=text8 unchanged?) the output was: cooccurrence.bin cooccurrence.shuf.bin text8 corpus.txt vectors.txt How can I used those files to load it as a GloVe model on python? You can do it using GloVe library: Install it: pip install glove_python Then: from glove

how do I create my own training corpus for stanford tagger?

倾然丶 夕夏残阳落幕 提交于 2019-12-03 02:00:58
I have to analyze informal english text with lots of short hands and local lingo. Hence I was thinking of creating the model for the stanford tagger. How do i create my own set of labelled corpus for the stanford tagger to train on? What is the syntax of the corpus and how long should my corpus be in order to achieve a desirable performance? To train the PoS tagger, see this mailing list post which is also included in the JavaDocs for the MaxentTagger class. The javadocs for the edu.stanford.nlp.tagger.maxent.Train class specifies the training format: The training file should be in the

How to store ner result in json/ database

陌路散爱 提交于 2019-12-02 20:04:36
问题 import nltk from itertools import groupby def get_continuous_chunks(tagged_sent): continuous_chunk = [] current_chunk = [] for token, tag in tagged_sent: if tag != "O": current_chunk.append((token, tag)) else: if current_chunk: # if the current chunk is not empty continuous_chunk.append(current_chunk) current_chunk = [] # Flush the final current_chunk into the continuous_chunk, if any. if current_chunk: continuous_chunk.append(current_chunk) return continuous_chunk ne_tagged_sent = [('Rami',

Sentence compression using NLP [closed]

寵の児 提交于 2019-12-02 19:18:15
Using Machine translation, can I obtain a very compressed version of a sentence, eg. I would really like to have a delicious tasty cup of coffee would be translated to I want coffee Does any of the NLP engines provide such a functionality? I got a few research papers that does paraphase generation and sentence compression . But is there any library which has already implemented this? If your intention is to make your sentences brief without losing important idea from that sentences then you can do that by just extracting triplet subject-predicate-object. Talking about tools/engine, I recommend