wordnet | 易学教程

Wordnet sqlite Synonyms and Samples

阅读更多关于 Wordnet sqlite Synonyms and Samples

问题 I am trying to get the list of Synonyms and Samples given the wordid. After a lot of trial and error I can get the samples for all the synsets but not the actual synonyms. Here is my query which gives me the following results. select senses.wordid, senses.synsetid, senses.sensekey, synsets.definition FROM senses LEFT OUTER JOIN synsets ON senses.synsetid = synsets.synsetid where senses.wordid = 79459 I know you can get the synonyms by submiting the synsetid back to the senses table which

Get List of Nouns & Adjectives from WordNet

阅读更多关于 Get List of Nouns & Adjectives from WordNet

问题 I am pretty new to Wordnet and so finding it a little confusing to be honest. Just wondering how I am able to generate a full list of nouns and a full list of adjectives from the Wordnet database ? Thanks 回答1: If you're using the MySQL version of Wordnet 3.0 you can use the following query: select lemma from words left join senses on words.wordid = senses.wordid left join synsets on senses.synsetid = synsets.synsetid where pos = 'n' And replace 'n' with 'a' for the adjectives. Steve 来源： https

Does the lemmatization mechanism reduce the size of the corpus?

阅读更多关于 Does the lemmatization mechanism reduce the size of the corpus?

问题 Dear Community Members, During the pre-processing of data, after splitting the raw_data into tokens, I have used the popular WordNet Lemmatizer to generate the stems. I am performing experiments on a dataset that has 18953 tokens. My question is, does the lemmatization process reduce the size of corpus? I am confused, kindly help in this regard. Any help is appreciated! 回答1: Lemmatization converts each token (aka form ) in the sentence into its lemma form (aka type ): >>> from nltk import

Wordnet Lemmatizer for R

阅读更多关于 Wordnet Lemmatizer for R

I would like to use the wordnet lemmatizer to lemmatize the words in a > a<-c("He saw a see-saw on a sea shore", "she is feeling cold") > a [1] "He saw a see-saw on a sea shore" "she is feeling cold" I convert a into a corpus and do pre-processing steps (like stopword removal, lemmatization etc) > a <- Corpus(VectorSource(a)) I wanted to do the lemmatization in the below way, > filter <- getTermFilter("ExactMatchFilter", a, TRUE) > terms <- getIndexTerms("NOUN", 1, filter) > sapply(terms, getLemma) but I get this error > filter <- getTermFilter("ExactMatchFilter", a, TRUE) Error in .jnew(paste

word disambiguation algorithm (Lesk algorithm)

阅读更多关于 word disambiguation algorithm (Lesk algorithm)

Hii.. Can anybody help me to find an algorithm in Java code to find synonyms of a search word based on the context and I want to implement the algorithm with WordNet database. For example, "I am running a Java program". From the context, I want to find the synonyms for the word "running", but the synonyms must be suitable according to a context. Let me illustrate a possible approach: Let your sentence be A B C Let each word have synsets i.e. {A:(a1, a2, a3), B:(b1), C:(c1, c2)} Now form possible synset sets: (a1, b1, c1), (a1, b1, c2), (a2, b1, c1) ... (a3, b1, c2) Define function F(a, b, c)

Calling wordnet from php (Wordnet class or API for PHP)

阅读更多关于 Calling wordnet from php (Wordnet class or API for PHP)

I am trying to write a program to find similarity between two documents, and since im using only english, I decided to use wordnet, but I cannot find a way to link the wordnet with php, I cannot find any wordnet api from php. I saw in the forum some one said (Spudley) he called wordnet from php (using shell_exec() function), Thesaurus class or API for PHP [edited] I would really like to know a method used or some example code, a tutorial perhaps to start using the wordnet with php. many thanks The PHP extension which is linked to from the WordNet site is very old and out of date -- it claims

Using WordNet to determine semantic similarity between two texts?

阅读更多关于 Using WordNet to determine semantic similarity between two texts?

How can you determine the semantic similarity between two texts in python using WordNet? The obvious preproccessing would be removing stop words and stemming, but then what? The only way I can think of would be to calculate the WordNet path distance between each word in the two texts. This is standard for unigrams. But these are large (400 word) texts, that are natural language documents, with words that are not in any particular order or structure (other than those imposed by English grammar). So, which words would you compare between texts? How would you do this in python? One thing that you

How to implement category based text tagging using WordNet or related to wordnet?

阅读更多关于 How to implement category based text tagging using WordNet or related to wordnet?

How to tag text using wordnet by word's category (java as a interfacer ) ? Example Consider the sentences: 1) Computers need keyboard , moniter , CPU to work. 2) Automobile uses gears and clutch . Now my objective is , the example sentences have to be tagged as 1st sentence Computer/electronic keyboard/electronic CPU / electronic 2nd sentence Automobile / mechanical gears / mechanical clutch / mechanical some extra example ... "Clutch and gear is monitored using microchip " -> clutch /mechanical , gear/mechanical , microchip / electronic "software used here to monitor hydrogen levels" ->

Sentence Similarity using WS4J

阅读更多关于 Sentence Similarity using WS4J

问题 I want to use ws4j to calculate similarity between two sentence. I am using the Online Demo of WS4J @ WS4J Online demo I am using the default example sentences given by WS4J. After entering the sentence and hitting on calculate similarity button, i am getting the following output: Here i am getting the similarity between individual tokens of the sentence. How do i proceed further from here.I want to get a single value (say 0.5 or 0.8) which denotes the similarity of these 2 sentences. Is

How to automatically label a cluster of words using semantics?

阅读更多关于 How to automatically label a cluster of words using semantics?

The context is : I already have clusters of words (phrases actually) resulting from kmeans applied to internet search queries and using common urls in the results of the search engine as a distance (co-occurrence of urls rather than words if I simplify a lot). I would like to automatically label the clusters using semantics, in other words I'd like to extract the main concept surrounding a group of phrases considered together. For example - sorry for the subject of my example - if I have the following bunch of queries : ['my husband attacked me','he was arrested by the police','the trial is