nlp

R remove stopwords from a character vector using %in%

大憨熊 提交于 2019-12-19 03:44:23
问题 I have a data frame with strings that I'd like to remove stop words from. I'm trying to avoid using the tm package as it's a large data set and tm seems to run a bit slowly. I am using the tm stopword dictionary. library(plyr) library(tm) stopWords <- stopwords("en") class(stopWords) df1 <- data.frame(id = seq(1,5,1), string1 = NA) head(df1) df1$string1[1] <- "This string is a string." df1$string1[2] <- "This string is a slightly longer string." df1$string1[3] <- "This string is an even

Analysing meaning of sentences

Deadly 提交于 2019-12-18 17:54:13
问题 Are there any tools that analyze the meaning of given sentences? Recommendations are greatly appreciated. Thanks in advance! 回答1: I am also looking for similar tools. One thing I found recently was this sentiment analysis tool built by researchers at Stanford. It provides a model of analyzing the sentiment of a given sentence. It's interesting and even this seemingly simple idea is quite involved to model in an accurate way. It utilizes machine learning to develop higher accuracy as well.

How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

陌路散爱 提交于 2019-12-18 17:33:11
问题 I'm using the Stanford's CoreNLP Named Entity Recognizer (NER) and Part-of-Speech (POS) tagger in my application. The problem is that my code tokenizes the text beforehand and then I need to NER and POS tag each token. However I was only able to find out how to do that using the command line options but not programmatically. Can someone please tell me how programmatically can I NER and POS tag pretokenized text using Stanford's CoreNLP? Edit: I'm actually using the individual NER and POS

How to find out wether a word exists in english using nltk

自作多情 提交于 2019-12-18 17:04:58
问题 I am looking for a proper solution to this question. This question has been asked many times before and i didnt find a single answer that suited. I need to use a corpus in NLTK to detect whether a word is an english word I have tried to do : wordnet.synsets(word) This doesnt word for many common words. Using a list of words in english and performing lookup in a file is not an option. Using enchant is not an option either. If there is another library that can do the same, please provide the

Visualize Parse Tree Structure

萝らか妹 提交于 2019-12-18 16:46:08
问题 I would like to display the parsing (POS tagging) from openNLP as a tree structure visualization. Below I provide the parse tree from openNLP but I can not plot as a visual tree common to Python's parsing. install.packages( "http://datacube.wu.ac.at/src/contrib/openNLPmodels.en_1.5-1.tar.gz", repos=NULL, type="source" ) library(NLP) library(openNLP) x <- 'Scroll bar does not work the best either.' s <- as.String(x) ## Annotators sent_token_annotator <- Maxent_Sent_Token_Annotator() word_token

I have a list of country codes and a list of language codes. How do I map from country code to language code?

爷,独闯天下 提交于 2019-12-18 16:25:11
问题 When the user visits the site, I can get their country code. I want to use this to set the default language (which they can later modify if necessary, just a general guess as to what language they might speak based on what country they are in). Is there a definitive mapping from country codes to language codes that exists somewhere? I could not find it. I know that not everyone in a particular country speaks the same language, but I just need a general mapping, the user can select their

How to generate a list of antonyms for adjectives in WordNet using Python

好久不见. 提交于 2019-12-18 16:17:11
问题 I want to do the following in Python (I have the NLTK library, but I'm not great with Python, so I've written the following in a weird pseudocode): from nltk.corpus import wordnet as wn #Import the WordNet library for each adjective as adj in wn #Get all adjectives from the wordnet dictionary print adj & antonym #List all antonyms for each adjective once list is complete then export to txt file This is so I can generate a complete dictionary of antonyms for adjectives. I think it should be

What programming language is the most English-like? [closed]

放肆的年华 提交于 2019-12-18 15:32:30
问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . I'm mainly a Python programmer, and it is often described as being "executable pseudo-code". I have used a little bit of AppleScript,

Why are these words considered stopwords?

半腔热情 提交于 2019-12-18 15:02:23
问题 I do not have a formal background in Natural Language Processing was wondering if someone from the NLP side can shed some light on this. I am playing around with the NLTK library and I was specifically looking into the stopwords function provided by this package: In [80]: nltk.corpus.stopwords.words('english') Out[80]: ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself',

Probability tree for sentences in nltk employing both lookahead and lookback dependencies

亡梦爱人 提交于 2019-12-18 14:55:07
问题 Does nltk or any other NLP tool allow to construct probability trees based on input sentences thus storing the language model of the input text in a dictionary tree, the following example gives the rough idea, but I need the same functionality such that a word Wt does not just probabilistically modelled on past input words(history) Wt-n but also on lookahead words like Wt+m. Also the lookback and lookahead word count should also be 2 or more i.e. bigrams or more. Are there any other libraries