Simple Python implementation of collaborative topic modeling?

匿名 (未验证) 提交于 2019-12-03 02:45:02

问题:

I came across these 2 papers which combined collaborative filtering (Matrix factorization) and Topic modelling (LDA) to recommend users similar articles/posts based on topic terms of post/articles that users are interested in.

The papers (in PDF) are: "Collaborative Topic Modeling for Recommending Scientific Articles" and "Collaborative Topic Modeling for Recommending GitHub Repositories"

The new algorithm is called collaborative topic regression. I was hoping to find some python code that implemented this but to no avail. This might be a long shot but can someone show a simple python example?

回答1:

This should get you started (although not sure why this hasn't been posted yet): https://github.com/arongdari/python-topic-model

More specifically: https://github.com/arongdari/python-topic-model/blob/master/ptm/collabotm.py

class CollaborativeTopicModel:     """     Wang, Chong, and David M. Blei. "Collaborative topic                                  modeling for recommending scientific articles."     Proceedings of the 17th ACM SIGKDD international conference on Knowledge                                 discovery and data mining. ACM, 2011.     Attributes     ----------     n_item: int         number of items     n_user: int         number of users     R: ndarray, shape (n_user, n_item)         user x item rating matrix     """ 

Looks nice and straightforward. I still suggest at least looking at gensim. Radim has done a fantastic job of optimizing that software very well.



回答2:

A very simple LDA implementation using gensin. You can find more informations here: https://radimrehurek.com/gensim/tutorial.html

I hope it can help you

from nltk.corpus import stopwords from nltk.tokenize import RegexpTokenizer from nltk.stem import RSLPStemmer from gensim import corpora, models import gensim  st = RSLPStemmer() texts = []  doc1 = "Veganism is both the practice of abstaining from the use of animal products, particularly in diet, and an associated philosophy that rejects the commodity status of animals" doc2 = "A follower of either the diet or the philosophy is known as a vegan." doc3 = "Distinctions are sometimes made between several categories of veganism." doc4 = "Dietary vegans refrain from ingesting animal products. This means avoiding not only meat but also egg and dairy products and other animal-derived foodstuffs." doc5 = "Some dietary vegans choose to wear clothing that includes animal products (for example, leather or wool)."   docs = [doc1, doc2, doc3, doc4, doc5]  for i in docs:      tokens = word_tokenize(i.lower())     stopped_tokens = [w for w in tokens if not w in stopwords.words('english')]     stemmed_tokens = [st.stem(i) for i in stopped_tokens]     texts.append(stemmed_tokens)  dictionary = corpora.Dictionary(texts) corpus = [dictionary.doc2bow(text) for text in texts]  # generate LDA model using gensim   ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=2, id2word = dictionary, passes=20) print(ldamodel.print_topics(num_topics=2, num_words=4)) 

[(0, u'0.066*animal + 0.065*, + 0.047*product + 0.028*philosophy'), (1, u'0.085*. + 0.047*product + 0.028*dietary + 0.028*veg')]



回答3:

As you have tagged machine-learning and python, did you take a look at python pandas & sklearn modules, because with both of them you can quickly create a lot of linear regression object.

Also there is a code example relative to Topic extraction (with Non-negative Matrix Factorization and Latent Dirichlet Allocation) which may fit your exact needs and also help you to discover sklearn module

Regards



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!