recommendation-engine

SOLR and Natural Language Parsing - Can I use it?

大兔子大兔子 提交于 2021-02-15 08:18:53
问题 Requirements Word frequency algorithm for natural language processing Using Solr While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP. I thought of SOLR because: It's got a bunch of tokenizers and performs a lot of NLP. It's pretty use to use out of the box. It's restful distributed app, so it's easy to hook up I've spent some time with it, so using could save me time. Can I use Solr? Although the above

SOLR and Natural Language Parsing - Can I use it?

若如初见. 提交于 2021-02-15 08:16:45
问题 Requirements Word frequency algorithm for natural language processing Using Solr While the answer for that question is excellent, I was wondering if I could make use of all the time I spent getting to know SOLR for my NLP. I thought of SOLR because: It's got a bunch of tokenizers and performs a lot of NLP. It's pretty use to use out of the box. It's restful distributed app, so it's easy to hook up I've spent some time with it, so using could save me time. Can I use Solr? Although the above

Why does ALS.trainImplicit give better predictions for explicit ratings?

筅森魡賤 提交于 2021-02-08 08:44:18
问题 Edit: I tried a standalone Spark application (instead of PredictionIO) and my observations are the same. So this is not a PredictionIO issue, but still confusing. I am using PredictionIO 0.9.6 and the Recommendation template for collaborative filtering. The ratings in my data set are numbers between 1 and 10. When I first trained a model with defaults from the template (using ALS.train ), the predictions were horrible, at least subjectively. Scores ranged up to 60.0 or so but the

Converting Pandas DataFrame to sparse matrix

吃可爱长大的小学妹 提交于 2021-02-08 02:15:43
问题 Here is my code: data=pd.get_dummies(data['movie_id']).groupby(data['user_id']).apply(max) df=pd.DataFrame(data) replace=df.replace(0,np.NaN) t=replace.fillna(-1) sparse=sp.csr_matrix(t.values) My data consist of two columns which are movie_id and user_id. user_id movie_id 5 1000 6 1007 I want to convert the data to a sparse matrix. I first created an interaction matrix where rows indicate user_id and columns indicate movie_id with positive interaction as +1 and negative interaction as -1.

Converting Pandas DataFrame to sparse matrix

拟墨画扇 提交于 2021-02-08 02:12:56
问题 Here is my code: data=pd.get_dummies(data['movie_id']).groupby(data['user_id']).apply(max) df=pd.DataFrame(data) replace=df.replace(0,np.NaN) t=replace.fillna(-1) sparse=sp.csr_matrix(t.values) My data consist of two columns which are movie_id and user_id. user_id movie_id 5 1000 6 1007 I want to convert the data to a sparse matrix. I first created an interaction matrix where rows indicate user_id and columns indicate movie_id with positive interaction as +1 and negative interaction as -1.

Mean Percentile Ranking (MPR) explanation

99封情书 提交于 2020-01-23 03:01:06
问题 I am trying to use MPR as a metric to evaluate my recommendation system based on implicit feedback. Can somebody please explain MPR? I have gone through this paper However, I can't seem to get an intuitive understanding. Any help would be appreciated. EDIT : I went through Microsoft's research on metrics for recommendation engine metrics It is recommended that MPR is recommended when we're looking for one 'positive' result. Can somebody also explain why that is the case? EDIT 2 : 来源: https:/

Multiple models in Myrrix

痴心易碎 提交于 2020-01-15 12:21:46
问题 I have a CSV file like this: typeA,typeB typeA,typeC typeA,typeC typeA,typeB Here, typeA, typeB and typeC are 3 different types of entities. Consider types B and C to be two different types of items and consider type A to be the users. I can build a model by feeding this CSV file into Myrrix. This file has two types only, B (the "B" items from the former CSV file are in here as users) and D. Now, suppose I have another CSV file like this: typeB,typeD typeB,typeD typeB,typeD typeB,typeD Here,

How to run Multi threaded jobs in apache spark using scala or python?

让人想犯罪 __ 提交于 2020-01-14 12:34:57
问题 I am facing a problem related to concurrency in spark which is stopping me from using it in production but I know there is a way out of it. I am trying to run Spark ALS on 7 million users for a billion products using order history. Firstly I am taking a list of distinct Users and then running a loop on these users to get recommendations, which is pretty slow process and will take days to get recommendations for all users. I tried doing cartesian users and products to get recommendations for

Appending pandas DataFrame with MultiIndex with data containing new labels, but preserving the integer positions of the old MultiIndex

狂风中的少年 提交于 2020-01-14 07:27:29
问题 Base scenario For a recommendation service I am training a matrix factorization model (LightFM) on a set of user-item interactions. For the matrix factorization model to yield the best results, I need to map my user and item IDs to a continuous range of integer IDs starting at 0. I'm using a pandas DataFrame in the process, and I have found a MultiIndex to be extremely convenient to create this mapping, like so: ratings = [{'user_id': 1, 'item_id': 1, 'rating': 1.0}, {'user_id': 1, 'item_id':

Design of the recommendation engine database?

五迷三道 提交于 2020-01-12 06:20:44
问题 i am currently working on recommendation systems especially for audio files.but i am a beginner at this subject.i am trying to design database first with mysql but i cant decide how to do it.İt is basicly a system which users create profile then search for the music and system recommend them music similar to they liked. which database should i use ?(Mysql comes my mind as a first guess) it is a web project and also then with mobile side.Which technologies should i use?(php,android platform...