latent-semantic-indexing

Any Latent Semantic Indexing?

阅读更多关于 Any Latent Semantic Indexing?

问题 Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model. 回答1: Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source

How is TF-IDF implemented in gensim tool in python?

阅读更多关于 How is TF-IDF implemented in gensim tool in python?

问题 From the documents which i found out from the net i figured out the expression used to determine the Term Frequency and Inverse Document frequency weights of terms in a corpus to be tf-idf(wt)= tf * log(|N|/d); I was going through the implementation of tf-idf mentioned in gensim. The example given in the documentation is >>> doc_bow = [(0, 1), (1, 1)] >>> print tfidf[doc_bow] # step 2 -- use the model to transform vectors [(0, 0.70710678), (1, 0.70710678)] Which apparently does not follow the

Any Latent Semantic Indexing?

阅读更多关于 Any Latent Semantic Indexing?

Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model. Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source implementations. A google search for java LSI leads to a similar question that recommends SemanticVectors. A

What NLP tools to use to match phrases having similar meaning or semantics

阅读更多关于 What NLP tools to use to match phrases having similar meaning or semantics

问题 I am working on a project which requires me to match a phrase or keyword with a set of similar keywords. I need to perform semantic analysis for the same. an example: Relevant QT cheap health insurance affordable health insurance low cost medical insurance health plan for less inexpensive health coverage Common Meaning low cost health insurance Here the the word under Common Meaning column should match the under Relevant QT column. I looked at a bunch of tools and techniques to do the same. S

Clustering using Latent Dirichlet Allocation algo in gensim

阅读更多关于 Clustering using Latent Dirichlet Allocation algo in gensim

问题 Is it possible to do clustering in gensim for a given set of inputs using LDA? How can I go about it? 回答1: LDA produces a lower dimensional representation of the documents in a corpus. To this low-d representation you could apply a clustering algorithm, e.g. k-means. Since each axis corresponds to a topic, a simpler approach would be assigning each document to the topic onto which its projection is largest. 回答2: Yes you can. Here is a tutorial: http://nlp.fi.muni.cz/projekty/gensim/wiki.html

What NLP tools to use to match phrases having similar meaning or semantics

阅读更多关于 What NLP tools to use to match phrases having similar meaning or semantics

I am working on a project which requires me to match a phrase or keyword with a set of similar keywords. I need to perform semantic analysis for the same. an example: Relevant QT cheap health insurance affordable health insurance low cost medical insurance health plan for less inexpensive health coverage Common Meaning low cost health insurance Here the the word under Common Meaning column should match the under Relevant QT column. I looked at a bunch of tools and techniques to do the same. S-Match seemed very promising, but I have to work in Python, not in Java. Also Latent Semantic Analysis

Clustering using Latent Dirichlet Allocation algo in gensim

阅读更多关于 Clustering using Latent Dirichlet Allocation algo in gensim

Is it possible to do clustering in gensim for a given set of inputs using LDA? How can I go about it? LDA produces a lower dimensional representation of the documents in a corpus. To this low-d representation you could apply a clustering algorithm, e.g. k-means. Since each axis corresponds to a topic, a simpler approach would be assigning each document to the topic onto which its projection is largest. Yes you can. Here is a tutorial: http://nlp.fi.muni.cz/projekty/gensim/wiki.html#latent-dirichlet-allocation First load you corpus, then call: lda = gensim.models.ldamodel.LdaModel(corpus=mm,

How do we decide the number of dimensions for Latent semantic analysis ?

阅读更多关于 How do we decide the number of dimensions for Latent semantic analysis ?

I have been working on latent semantic analysis lately. I have implemented it in java by making use of the Jama package. Here is the code: Matrix vtranspose ; a = new Matrix(termdoc); termdoc = a.getArray(); a = a.transpose() ; SingularValueDecomposition sv =new SingularValueDecomposition(a) ; u = sv.getU(); v = sv.getV(); s = sv.getS(); vtranspose = v.transpose() ; // we obtain this as a result of svd uarray = u.getArray(); sarray = s.getArray(); varray = vtranspose.getArray(); if(semantics.maketerms.nodoc>50) { sarray_mod = new double[50][50]; uarray_mod = new double[uarray.length][50];

How do we decide the number of dimensions for Latent semantic analysis ?

阅读更多关于 How do we decide the number of dimensions for Latent semantic analysis ?

问题 I have been working on latent semantic analysis lately. I have implemented it in java by making use of the Jama package. Here is the code: Matrix vtranspose ; a = new Matrix(termdoc); termdoc = a.getArray(); a = a.transpose() ; SingularValueDecomposition sv =new SingularValueDecomposition(a) ; u = sv.getU(); v = sv.getV(); s = sv.getS(); vtranspose = v.transpose() ; // we obtain this as a result of svd uarray = u.getArray(); sarray = s.getArray(); varray = vtranspose.getArray(); if(semantics