nlp | 易学教程

R remove stopwords from a character vector using %in%

阅读更多关于 R remove stopwords from a character vector using %in%

问题 I have a data frame with strings that I'd like to remove stop words from. I'm trying to avoid using the tm package as it's a large data set and tm seems to run a bit slowly. I am using the tm stopword dictionary. library(plyr) library(tm) stopWords <- stopwords("en") class(stopWords) df1 <- data.frame(id = seq(1,5,1), string1 = NA) head(df1) df1$string1[1] <- "This string is a string." df1$string1[2] <- "This string is a slightly longer string." df1$string1[3] <- "This string is an even

Analysing meaning of sentences

阅读更多关于 Analysing meaning of sentences

问题 Are there any tools that analyze the meaning of given sentences? Recommendations are greatly appreciated. Thanks in advance! 回答1: I am also looking for similar tools. One thing I found recently was this sentiment analysis tool built by researchers at Stanford. It provides a model of analyzing the sentiment of a given sentence. It's interesting and even this seemingly simple idea is quite involved to model in an accurate way. It utilizes machine learning to develop higher accuracy as well.

How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

阅读更多关于 How to NER and POS tag a pre-tokenized text with Stanford CoreNLP?

问题 I'm using the Stanford's CoreNLP Named Entity Recognizer (NER) and Part-of-Speech (POS) tagger in my application. The problem is that my code tokenizes the text beforehand and then I need to NER and POS tag each token. However I was only able to find out how to do that using the command line options but not programmatically. Can someone please tell me how programmatically can I NER and POS tag pretokenized text using Stanford's CoreNLP? Edit: I'm actually using the individual NER and POS

How to find out wether a word exists in english using nltk

阅读更多关于 How to find out wether a word exists in english using nltk

问题 I am looking for a proper solution to this question. This question has been asked many times before and i didnt find a single answer that suited. I need to use a corpus in NLTK to detect whether a word is an english word I have tried to do : wordnet.synsets(word) This doesnt word for many common words. Using a list of words in english and performing lookup in a file is not an option. Using enchant is not an option either. If there is another library that can do the same, please provide the

Visualize Parse Tree Structure

阅读更多关于 Visualize Parse Tree Structure

问题 I would like to display the parsing (POS tagging) from openNLP as a tree structure visualization. Below I provide the parse tree from openNLP but I can not plot as a visual tree common to Python's parsing. install.packages( "http://datacube.wu.ac.at/src/contrib/openNLPmodels.en_1.5-1.tar.gz", repos=NULL, type="source" ) library(NLP) library(openNLP) x <- 'Scroll bar does not work the best either.' s <- as.String(x) ## Annotators sent_token_annotator <- Maxent_Sent_Token_Annotator() word_token

I have a list of country codes and a list of language codes. How do I map from country code to language code?

阅读更多关于 I have a list of country codes and a list of language codes. How do I map from country code to language code?

问题 When the user visits the site, I can get their country code. I want to use this to set the default language (which they can later modify if necessary, just a general guess as to what language they might speak based on what country they are in). Is there a definitive mapping from country codes to language codes that exists somewhere? I could not find it. I know that not everyone in a particular country speaks the same language, but I just need a general mapping, the user can select their

How to generate a list of antonyms for adjectives in WordNet using Python

阅读更多关于 How to generate a list of antonyms for adjectives in WordNet using Python

问题 I want to do the following in Python (I have the NLTK library, but I'm not great with Python, so I've written the following in a weird pseudocode): from nltk.corpus import wordnet as wn #Import the WordNet library for each adjective as adj in wn #Get all adjectives from the wordnet dictionary print adj & antonym #List all antonyms for each adjective once list is complete then export to txt file This is so I can generate a complete dictionary of antonyms for adjectives. I think it should be

What programming language is the most English-like? [closed]

阅读更多关于 What programming language is the most English-like? [closed]

问题 As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 6 years ago . I'm mainly a Python programmer, and it is often described as being "executable pseudo-code". I have used a little bit of AppleScript,

Why are these words considered stopwords?

阅读更多关于 Why are these words considered stopwords?

问题 I do not have a formal background in Natural Language Processing was wondering if someone from the NLP side can shed some light on this. I am playing around with the NLTK library and I was specifically looking into the stopwords function provided by this package: In [80]: nltk.corpus.stopwords.words('english') Out[80]: ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself',

Probability tree for sentences in nltk employing both lookahead and lookback dependencies

阅读更多关于 Probability tree for sentences in nltk employing both lookahead and lookback dependencies

问题 Does nltk or any other NLP tool allow to construct probability trees based on input sentences thus storing the language model of the input text in a dictionary tree, the following example gives the rough idea, but I need the same functionality such that a word Wt does not just probabilistically modelled on past input words(history) Wt-n but also on lookahead words like Wt+m. Also the lookback and lookahead word count should also be 2 or more i.e. bigrams or more. Are there any other libraries