latent-semantic-indexing

Any Latent Semantic Indexing?

空扰寡人 提交于 2019-12-09 12:44:48
问题 Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model. 回答1: Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source

How is TF-IDF implemented in gensim tool in python?

一世执手 提交于 2019-12-07 17:20:22
问题 From the documents which i found out from the net i figured out the expression used to determine the Term Frequency and Inverse Document frequency weights of terms in a corpus to be tf-idf(wt)= tf * log(|N|/d); I was going through the implementation of tf-idf mentioned in gensim. The example given in the documentation is >>> doc_bow = [(0, 1), (1, 1)] >>> print tfidf[doc_bow] # step 2 -- use the model to transform vectors [(0, 0.70710678), (1, 0.70710678)] Which apparently does not follow the

Any Latent Semantic Indexing?

天涯浪子 提交于 2019-12-03 14:26:00
Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model. Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source implementations. A google search for java LSI leads to a similar question that recommends SemanticVectors. A

What NLP tools to use to match phrases having similar meaning or semantics

二次信任 提交于 2019-12-03 09:08:32
问题 I am working on a project which requires me to match a phrase or keyword with a set of similar keywords. I need to perform semantic analysis for the same. an example: Relevant QT cheap health insurance affordable health insurance low cost medical insurance health plan for less inexpensive health coverage Common Meaning low cost health insurance Here the the word under Common Meaning column should match the under Relevant QT column. I looked at a bunch of tools and techniques to do the same. S

Clustering using Latent Dirichlet Allocation algo in gensim

淺唱寂寞╮ 提交于 2019-12-03 06:18:55
问题 Is it possible to do clustering in gensim for a given set of inputs using LDA? How can I go about it? 回答1: LDA produces a lower dimensional representation of the documents in a corpus. To this low-d representation you could apply a clustering algorithm, e.g. k-means. Since each axis corresponds to a topic, a simpler approach would be assigning each document to the topic onto which its projection is largest. 回答2: Yes you can. Here is a tutorial: http://nlp.fi.muni.cz/projekty/gensim/wiki.html

What NLP tools to use to match phrases having similar meaning or semantics

浪尽此生 提交于 2019-12-02 23:16:41
I am working on a project which requires me to match a phrase or keyword with a set of similar keywords. I need to perform semantic analysis for the same. an example: Relevant QT cheap health insurance affordable health insurance low cost medical insurance health plan for less inexpensive health coverage Common Meaning low cost health insurance Here the the word under Common Meaning column should match the under Relevant QT column. I looked at a bunch of tools and techniques to do the same. S-Match seemed very promising, but I have to work in Python, not in Java. Also Latent Semantic Analysis

Clustering using Latent Dirichlet Allocation algo in gensim

无人久伴 提交于 2019-12-02 19:41:33
Is it possible to do clustering in gensim for a given set of inputs using LDA? How can I go about it? LDA produces a lower dimensional representation of the documents in a corpus. To this low-d representation you could apply a clustering algorithm, e.g. k-means. Since each axis corresponds to a topic, a simpler approach would be assigning each document to the topic onto which its projection is largest. Yes you can. Here is a tutorial: http://nlp.fi.muni.cz/projekty/gensim/wiki.html#latent-dirichlet-allocation First load you corpus, then call: lda = gensim.models.ldamodel.LdaModel(corpus=mm,

How do we decide the number of dimensions for Latent semantic analysis ?

核能气质少年 提交于 2019-11-28 21:56:41
I have been working on latent semantic analysis lately. I have implemented it in java by making use of the Jama package. Here is the code: Matrix vtranspose ; a = new Matrix(termdoc); termdoc = a.getArray(); a = a.transpose() ; SingularValueDecomposition sv =new SingularValueDecomposition(a) ; u = sv.getU(); v = sv.getV(); s = sv.getS(); vtranspose = v.transpose() ; // we obtain this as a result of svd uarray = u.getArray(); sarray = s.getArray(); varray = vtranspose.getArray(); if(semantics.maketerms.nodoc>50) { sarray_mod = new double[50][50]; uarray_mod = new double[uarray.length][50];

How do we decide the number of dimensions for Latent semantic analysis ?

牧云@^-^@ 提交于 2019-11-27 14:06:20
问题 I have been working on latent semantic analysis lately. I have implemented it in java by making use of the Jama package. Here is the code: Matrix vtranspose ; a = new Matrix(termdoc); termdoc = a.getArray(); a = a.transpose() ; SingularValueDecomposition sv =new SingularValueDecomposition(a) ; u = sv.getU(); v = sv.getV(); s = sv.getS(); vtranspose = v.transpose() ; // we obtain this as a result of svd uarray = u.getArray(); sarray = s.getArray(); varray = vtranspose.getArray(); if(semantics