nlp | 易学教程

Stanford Parser questions

阅读更多关于 Stanford Parser questions

问题 I am writing a project that works with NLP (natural language parser). I am using the stanford parser. I create a thread pool that takes sentences and run the parser with them. When I create one thread its all works fine, but when I create more, I get errors. The "test" procedure is finding words that have some connections. If I do an synchronized its supposed to work like one thread but still I get errors. My problem is that I have errors on this code: public synchronized String test(String s

How to split text into sentences when there is no space after full stop?

阅读更多关于 How to split text into sentences when there is no space after full stop?

问题 I have a text like 'A gas well near Surabaya in East Java operated by Lapindo Brantas Inc. has spewed steaming mud since May last year, submerging villages, industries and fields.A gas well near Surabaya in East Java operated by PT Lapindo Brantas has spewed steaming mud since May last year, submerging villages, factories and fields.Last week, Indonesia's coordinating minister for social welfare, Aburizal Bakrie, whose family firm controls Lapindo Brantas, said the volcano was a "natural

How to detect features of a product in an english sentence - nlp

阅读更多关于 How to detect features of a product in an english sentence - nlp

问题 I am trying to detect features(eg.: screen, processing speed) of a product(eg.: mobile, respectively) in an english sentence. For this, my approach is that in a paragraph(that talks about the product) containing multiple sentences, the words( apart from words like pronouns or sentiment words like good, bad etc, which I store in a file) that appear most frequently are the features of that product and so I rank on the basis of their frequency and their distance with the sentiment words and take

natural package for natural language facility not getting installed in meteor package

阅读更多关于 natural package for natural language facility not getting installed in meteor package

问题 I want to install natural package for natural language facility in a meteor project. Initially I tried to install using this link. It installed perfectly but when I ran my app, got error: ReferenceErros:require is not defined For this I followed this link ans solution not helped much. Then, found on googling that need to install natural package on meteor app, mrt add natural . The link for that is this. But when I am installing using this command, I got error as: /usr/local/lib/node_modules

Return Sentence That A Clicked Word Appears In

阅读更多关于 Return Sentence That A Clicked Word Appears In

问题 Follow up to a previous question: Use Javascript to get the Sentence of a Clicked Word I have been fumbling around with this question for some time now. However, I woke up this morning and started reading this: http://branch.com/b/nba-playoffs-round-1 Voila! Branch allow for users to select a sentence and then share it, save it, etc...That's exactly what I want to do. It looks like they're wrapping each sentence in <span> tags. Previously, people have suggested to find each <p> tag and then

nltk: how to get bigrams containing a specific word

阅读更多关于 nltk: how to get bigrams containing a specific word

问题 I am new to nltk, and would like to get the collocates of a specific word (e.g. "man") so that later I would filter them by frequency and sort them by PMI score. Here is my trial code to retrieve the bigrams containing "man", but it returns an empty list: >>> text = "hello, yesterday I have seen a man walking. On the other side there was another man yelling \"who are you, man?\"" >>> tokens = word_tokenize(text) >>> finder = BigramCollocationFinder.from_words(tokens, window_size=5) >>> filter

Ignore out-of-vocabulary words when averaging vectors in Spacy

阅读更多关于 Ignore out-of-vocabulary words when averaging vectors in Spacy

问题 I would like to use a pre-trained word2vec model in Spacy to encode titles by (1) mapping words to their vector embeddings and (2) perform the mean of word embeddings. To do this I use the following code: import spacy nlp = spacy.load('myspacy.bioword2vec.model') sentence = "I love Stack Overflow butitsalsodistractive" avg_vector = nlp(sentence).vector Where nlp(sentence).vector (1) tokenizes my sentence with white-space splitting, (2) vectorizes each word according to the dictionary provided

How to take the suffix in smoothing of Part of speech tagging

阅读更多关于 How to take the suffix in smoothing of Part of speech tagging

问题 I am making a "Part of speech Tagger". I am handling the unknown word with the suffix. But the main issue is that how would i decide the number of suffix... should it be pre-decided (like Weischedel approach) or I have to take the last few alphabets of the words(like Samuelsson approach). Which approach would be better...... 回答1: Quick googling suggests that the Weischedel approach is sufficient for English, which has only rudimentary morphological inflection. The Samuelsson approach seems to

How to interpret Python NLTK bigram likelihood ratios?

阅读更多关于 How to interpret Python NLTK bigram likelihood ratios?

问题 I'm trying to figure out how to properly interpret nltk 's "likelihood ratio" given the below code (taken from this question). import nltk.collocations import nltk.corpus import collections bgm = nltk.collocations.BigramAssocMeasures() finder = nltk.collocations.BigramCollocationFinder.from_words(nltk.corpus.brown.words()) scored = finder.score_ngrams(bgm.likelihood_ratio) # Group bigrams by first word in bigram. prefix_keys = collections.defaultdict(list) for key, scores in scored: prefix

Tensorflow raw_rnn retrieve tensor of shape BATCH x DIM from embedding matrix

阅读更多关于 Tensorflow raw_rnn retrieve tensor of shape BATCH x DIM from embedding matrix

问题 I am implementing encoder-decoder lstm, where I have to do custom computation at each step of the encoder. So, I am using raw_rnn . However, I am facing a problem accessing an element from the embeddings which is shaped as Batch x Time steps x Embedding dimensionality at time step time . Here is my setup: import tensorflow as tf import numpy as np batch_size, max_time, input_embedding_size = 5, 10, 16 vocab_size, num_units = 50, 64 encoder_inputs = tf.placeholder(shape=(None, None), dtype=tf