recommendation-engine

Better or Not combine Search Engine and Recommend System?

一世执手 提交于 2020-01-05 10:52:37
问题 In our project, we use search engine, but the result need to be ranked based on each user's interest, similar to recommendation according to users' keyword. If we separate the two system, it would cost a lot time. Is there a better way to combine Search Engine and Recommend System together? Or is there a simple way to customize my ranking strategy to achieve this? 回答1: This is what we were trying to do in our project as well. There are two things while solving this problem - Relevancy vs

How much data is needed for User Based CF or Item Based CF to give recommendation?

蓝咒 提交于 2020-01-05 02:48:10
问题 How much data is needed for User CF, Item CF to give recommendation? I've manually created a small dataset, so I can understand well how the algorithm is working. I found that for the small dataset I created, Slope-One can give a recommendation, User CF or Item CF can not give recommendation. What is the reason behind it? What is the threshold of the data amount ? 回答1: In user and item based CF, the size of the data set can be really small. The important part is the frequency of the mapping

Get wrong recommendation with ALS.recommendation

放肆的年华 提交于 2020-01-03 11:56:06
问题 I write a spark program for making recommendations. Then I used ALS.recommendation library. And I made a small test with the following dataset called trainData: (u1, m1, 1) (u1, m4, 1) (u2, m2, 1) (u2, m3, 1) (u3, m1, 1) (u3, m3, 1) (u3, m4, 1) (u4, m3, 1) (u4, m4, 1) (u5, m2, 1) (u5, m4, 1) The first column contains the user, the second contains the items rated by the users and the third contains the ratings. In my code written in scala I trained the model using: myModel = ALS.trainImplicit

Mahout precomputed Item-item similarity - slow recommendation

筅森魡賤 提交于 2020-01-03 03:45:23
问题 I am having performance issues with precomuted item-item similarities in Mahout. I have 4 million users with roughly the same amount of items, with around 100M user-item preferences. I want to do content-based recommendation based on the Cosine similarity of the TF-IDF vectors of the documents. Since computing this on the fly is slow, I precomputed the pairwise similarity of the top 50 most similar documents as follows: I used seq2sparse to produce TF-IDF vectors. I used mahout rowId to

Make numpy matrix more sparse

断了今生、忘了曾经 提交于 2019-12-31 03:10:49
问题 Suppose I have a numpy array np.array([ [3, 0, 5, 3, 0, 1], [0, 1, 2, 1, 5, 2], [4, 3, 5, 3, 1, 4], [2, 5, 2, 5, 3, 1], [0, 1, 2, 1, 5, 2], ]) Now, I want to randomly replace some elements with 0. So that I have an output like this np.array([ [3, 0, 0, 3, 0, 1], [0, 1, 2, 0, 5, 2], [0, 3, 0, 3, 1, 0], [2, 0, 2, 5, 0, 1], [0, 0, 2, 0, 5, 0], ]) 回答1: We can use np.random.choice(..., replace=False) to randomly select a number of unique non-zero flattened indices and then simply index and reset

Understanding the Pearson Correlation Coefficient

给你一囗甜甜゛ 提交于 2019-12-22 17:15:20
问题 As part of the calculations to generate a Pearson Correlation Coefficient, the following computation is performed: In the second formula: p_a,i is the predicted rating user a would give item i , n is the number of similar users being compared to, and ru,i is the rating of item i by user u . What value will be used if user u has not rated this item? Did I misunderstand anything here? 回答1: According to the link, earlier calculations in step 1 of the algorithm are over a set of items, indexed 1

Datasets for Apache Mahout

旧城冷巷雨未停 提交于 2019-12-22 08:56:20
问题 I am looking for datasets that can be used for implementing recommendation system usecase of Apache Mahout. I know of only MovieLens Data Sets from GroupLens Research group. Anyone knows any other datasets that can be used for recommendation system implementation? I am particularly interested in item-based data sets though other datasets are most welcome. 回答1: this is Sebastian from Mahout. There is a dataset from a czech dating website available that might be of interest to you: http://www

How is NaN handled in Pearson correlation user-user similarity matrix in a recommender system?

╄→гoц情女王★ 提交于 2019-12-22 05:29:50
问题 I am generating a user-user similarity matrix from a user-rating data (particularly MovieLens100K data). Computing correlation leads to some NaN values. I have tested in a smaller dataset: User-Item rating matrix I1 I2 I3 I4 U1 4 0 5 5 U2 4 2 1 0 U3 3 0 2 4 U4 4 4 0 0 User-User Pearson Correlation similarity matrix U1 U2 U3 U4 U5 U1 1 -1 0 -nan 0.755929 U2 -1 1 1 -nan -0.327327 U3 0 1 1 -nan 0.654654 U4 -nan -nan -nan -nan -nan U5 0.755929 -0.327327 0.654654 -nan 1 For computing the pearson

Solr MoreLikeThis boosting query fields

时光毁灭记忆、已成空白 提交于 2019-12-22 05:14:58
问题 I am experimenting with Solr's MoreLikeThis feature. My schema deals with articles, and I'm looking for similarities between articles within three fields: articletitle, articletext and topic. The following query works well: q=id:(2e2ec74c-7c26-49c9-b359-31a11ea50453) &rows=100000000&mlt=true &mlt.fl=articletext,articletitle,topic&mlt.boost=true&mlt.mindf=1&mlt.mintf=1 But I would like to experiment with boosting different query fields - i.e. putting more weight on similarities in the

Python recommendation engine [closed]

非 Y 不嫁゛ 提交于 2019-12-20 09:47:50
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . Is there a recommendation engine for python similar to Java Taste? 回答1: I haven't found much that runs natively in python, but someone created python wrappers for SUGGEST, which looks like a solid program. SUGGEST overview python wrappers Since python is still fairly slow when compared to C or Java, using