wordnet | 易学教程

General synonym and part of speech processing using nltk

阅读更多关于 General synonym and part of speech processing using nltk

问题 I'm trying to create a general synonym identifier for the words in a sentence which are significant (i.e. not "a" or "the"), and I am using the natural language toolkit(nltk) in python for it. The problem I am having is that the synonym finder in nltk requires a part of speech argument in order to be linked to its synonyms. My attempted fix for this was to use the simplified part of speech tagger present in nltk, and then reduce the first letter in order to pass this argument into the synonym

Compiling WordNet 3.0 on OSX 10.8.5

阅读更多关于 Compiling WordNet 3.0 on OSX 10.8.5

问题 I'm trying to get WordNet to work on my Notebook but it fails in the make step of the process as follows: WaldundWiesenComputer:WordNet-3.0 Gnaddel$ make make all-recursive Making all in doc Making all in html make[3]: Nothing to be done for `all'. Making all in man make[3]: Nothing to be done for `all'. Making all in pdf make[3]: Nothing to be done for `all'. Making all in ps make[3]: Nothing to be done for `all'. make[3]: Nothing to be done for `all-am'. Making all in dict make[2]: Nothing

Tense of a verb

阅读更多关于 Tense of a verb

问题 How do i find in wordnet or with a tool the tense of a verb ? for example given: input: happened -> output : past input: will happen -> output: future 回答1: You cannot do this with WordNet. Try the NodeBox Linguistics: Input: print en.verb.tense("was") Output: >>> 1st singular past 来源： https://stackoverflow.com/questions/8575873/tense-of-a-verb

How do I include pronouns and other types of words in Wordnet?

阅读更多关于 How do I include pronouns and other types of words in Wordnet?

问题 I am using Princeton's WordNet for an application, but there is no support for pronouns, conjunctions, and several other types of words within the database. Does anybody know if there is a way to supplement the Wordnet database with these types of words? Thanks, Ted 回答1: First, they're missing since WordNet only contains the open-class words as described in their page: Q. Why is WordNet missing: of, an, the, and, about, above, because, etc. A. WordNet only contains "open-class words": nouns,

could not find Wordnet dictionary error

阅读更多关于 could not find Wordnet dictionary error

I'm having trouble running wordnet in R . I loaded it into the library initially, but it didn't work. The error looked like this: Warning message: In initDict() : cannot find WordNet 'dict' directory: please set the environment variable WNHOME to its parent So, I added this line: Sys.setenv(WNHOME = "C:\\Program Files (x86)\\WordNet\\2.1") and then was able to use the library function to load it. I don't understand this line or error message at all, but it seems to fix this problem. However, whenever I try to use the package, it won't work. For example, I entered: filter <- getTermFilter(

Multi Threading in NLTK WordNetLemmatizer?

阅读更多关于 Multi Threading in NLTK WordNetLemmatizer?

问题 I am trying to use multi threading to speed up the process. I am using the wordnetlemmatizer to lemmatize the words and those words can be further used by sentiwordnet to calculate the sentiment of the text. My Sentiment analysis function where I am using the WordNetLemmatizer is as follows: import nltk from nltk.corpus import sentiwordnet as swn def SentimentA(doc, file_path): sentences = nltk.sent_tokenize(doc) # print(sentences) stokens = [nltk.word_tokenize(sent) for sent in sentences]

Trying to find synonyms using wordnet java api

阅读更多关于 Trying to find synonyms using wordnet java api

问题 I am trying to find synonyms of some words(String type) in java using Wordnet java api. I have difficulties though in figuring out how it works. I found this link http://lyle.smu.edu/~tspell/jaws/doc/edu/smu/tspell/wordnet/impl/file/ReferenceSynset.html#getTagCount%28java.lang.String%29 which I though it is useful, but still I don't know how to start. Do I have to create a ReferenceSynset object and then find its synonyms, and how can this be done? Or is there another easier way? Please help!

How to use the Spanish Wordnet in NLTK?

阅读更多关于 How to use the Spanish Wordnet in NLTK?

I just downloaded a Spanish Wordnet from the project GRIAL , the format is XML. How can I use it in Python NLTK? Besides that, in the same page you can download a tagged corpus in Spanish. How can I incorporate it as well? Use XMLCorpusReader to load XML data as corpus Here's the code to do that from nltk.corpus.reader import XMLCorpusReader reader = XMLCorpusReader(dir, file) A fully working example which uses XMLCorpusReader is given here 来源： https://stackoverflow.com/questions/25615741/how-to-use-the-spanish-wordnet-in-nltk

WordNetLemmatizer not returning the right lemma unless POS is explicit - Python NLTK

阅读更多关于 WordNetLemmatizer not returning the right lemma unless POS is explicit - Python NLTK

I'm lemmatizing the Ted Dataset Transcript. There's something strange I notice: Not all words are being lemmatized. To say, selected -> select Which is right. However, involved !-> involve and horsing !-> horse unless I explicitly input the 'v' (Verb) attribute. On the python terminal, I get the right output but not in my code : >>> from nltk.stem import WordNetLemmatizer >>> from nltk.corpus import wordnet >>> lem = WordNetLemmatizer() >>> lem.lemmatize('involved','v') u'involve' >>> lem.lemmatize('horsing','v') u'horse' The relevant section of the code is this: for l in LDA_Row[0].split('+')

What is the use of Brown Corpus in measuring Semantic Similarity based on WordNet

阅读更多关于 What is the use of Brown Corpus in measuring Semantic Similarity based on WordNet

I came across several methods for measuring semantic similarity that use the structure and hierarchy of WordNet, e.g. Jiang and Conrath measure (JNC), Resnik measure(RES), Lin measure (LIN) etc. The way they are measured using NLTK is: sim2=wn.jcn_similarity(entry1,entry2,brown_ic) sim3=entry1.res_similarity(entry2, brown_ic) sim4=entry1.lin_similarity(entry2,brown_ic) If WordNet is the basis of calculating semantic similarity, what is the use of Brown Corpus here? arturomp Take a look at the explanation at the NLTK howto for wordnet. Specifically, the *_ic notation is information content .