recommendation-engine

how can I match all the key value pair in python which running too long

只愿长相守 提交于 2019-11-29 17:36:26
User-item affinity and recommendations : I am creating a table which suggests "customers who bought this item also bought algorithm " Input dataset productId userId Prod1 a Prod1 b Prod1 c Prod1 d prod2 b prod2 c prod2 a prod2 b prod3 c prod3 a prod3 d prod3 c prod4 a prod4 b prod4 d prod4 a prod5 d prod5 a Output required Product1 Product2 score Prod1 prod3 Prod1 prod4 Prod1 prod5 prod2 Prod1 prod2 prod3 prod2 prod4 prod2 prod5 prod3 Prod1 prod3 prod2 Using code : #Get list of unique items itemList=list(set(main["productId"].tolist())) #Get count of users userCount=len(set(main["productId"]

Spark - How to create a sparse matrix from item ratings

我与影子孤独终老i 提交于 2019-11-29 10:10:16
问题 My question is equivalent to R-related post Create Sparse Matrix from a data frame, except that I would like to perform the same thing on Spark (preferably in Scala ). Sample of data in the data.txt file from which the sparse matrix is being created: UserID MovieID Rating 2 1 1 3 2 1 4 2 1 6 2 1 7 2 1 So in the end the columns are the movie IDs and the rows are the user IDs 1 2 3 4 5 6 7 1 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 3 0 1 0 0 0 0 0 4 0 1 0 0 0 0 0 5 0 0 0 0 0 0 0 6 0 1 0 0 0 0 0 7 0 1 0 0

How to use mllib.recommendation if the user ids are string instead of contiguous integers?

ⅰ亾dé卋堺 提交于 2019-11-29 01:56:42
I want to use Spark's mllib.recommendation library to build a prototype recommender system. However, the format of the user data I have is something of the following format: AB123XY45678 CD234WZ12345 EF345OOO1234 GH456XY98765 .... If I want to use the mllib.recommendation library, according to the API of the Rating class, the user ids have to be integers (also have to be contiguous?) It looks like some kind of conversion between the real user ids and the numeric ones used by Spark must be done. But how should I do this? Spark don't really require numeric id, it just needs to bee some unique

Collaborative Filtering: Non-Personalized item-to-item similarity

大兔子大兔子 提交于 2019-11-28 20:41:50
问题 I'm trying to compute item-to-item similarity along the lines of Amazon's "Customers who viewed/purchased X have also viewed/purchased Y and Z". All of the examples and references I've seen are for either computing item similarity for ranked items, for finding user-user similarity, or for finding recommended items based on the current users' history. I'd like to start off with a non-targeted approach before factoring in the current users' preferences. Looking at the Amazon.com recommendations

How to create my own recommendation engine? [closed]

回眸只為那壹抹淺笑 提交于 2019-11-28 15:00:49
I am interested in recommendation engines these days and I want to improve myself in this area. I am currently reading " Programming Collective Intelligence " I think this is the best book about this subject, from O'Reilly. But I don't have any ideas how to implement engine; What I mean by "no idea" is "don't know how to start". I have a project like Last.fm in my mind. Where do (should be implemented on database side or backend side) I start creating recommendation engine? What level of database knowledge will be needed? Is there any open source ones that can be used for help or any resource?

how can I match all the key value pair in python which running too long

假如想象 提交于 2019-11-28 10:27:42
问题 User-item affinity and recommendations : I am creating a table which suggests "customers who bought this item also bought algorithm " Input dataset productId userId Prod1 a Prod1 b Prod1 c Prod1 d prod2 b prod2 c prod2 a prod2 b prod3 c prod3 a prod3 d prod3 c prod4 a prod4 b prod4 d prod4 a prod5 d prod5 a Output required Product1 Product2 score Prod1 prod3 Prod1 prod4 Prod1 prod5 prod2 Prod1 prod2 prod3 prod2 prod4 prod2 prod5 prod3 Prod1 prod3 prod2 Using code : #Get list of unique items

How to build a 'related questions' engine?

有些话、适合烂在心里 提交于 2019-11-28 10:21:14
One of our bigger sites has a section where users can send questions to the website owner which get evaluated personally by his staff. When the same question pops up very often they can add this particular question to the Faq. In order to prevent them from receiving dozens of similar questions a day we would like to provide a feature similar to the 'Related questions' on this site (stack overflow). What ways are there to build this kind of feature? I know that i should somehow evaluate the question and compare it to the questions in the faq but how does this comparison work? Are keywords

Get Google Analytics “Visitors Flow” data from API

别等时光非礼了梦想. 提交于 2019-11-28 07:38:34
I'm trying to gather information from Google Analytics to build a recommendation engine for my site. The site consists of many pages, so I'm tracking the number of times a user clicks, for example, from page A to page B. Currently I can measure the A -> B transitions on Google Analytics with previousPagePath = '/A' and nextPagePath = '/B' , but the question I really want to answer is, "Of all the visits to the site that included viewing page A, how many times were pages B, C, ... viewed in the same visit?" For example, if the flow was A -> homepage -> B , then that would not be captured by my

Which is the proper way of filtering numeric values for a text field?

天涯浪子 提交于 2019-11-28 04:49:57
问题 I'm working on a textfield working with the kind of validation that wouldn't let you enter other than numeric values. As so, my initial code looked quite simple and similar to this: $(textField).onKeyPress(function(e) { if (e.which < 48 && e.which > 57) e.preventDefault(); }); This is fairly strightforward, but turns that (in the latest version of all browsers) Firefox will make this also prevent movement with the arrow keys and delete/backspace keys, whereas the other browsers would not.

How to use mllib.recommendation if the user ids are string instead of contiguous integers?

筅森魡賤 提交于 2019-11-27 21:46:05
问题 I want to use Spark's mllib.recommendation library to build a prototype recommender system. However, the format of the user data I have is something of the following format: AB123XY45678 CD234WZ12345 EF345OOO1234 GH456XY98765 .... If I want to use the mllib.recommendation library, according to the API of the Rating class, the user ids have to be integers (also have to be contiguous?) It looks like some kind of conversion between the real user ids and the numeric ones used by Spark must be