recommendation-engine | 易学教程

how can I match all the key value pair in python which running too long

阅读更多关于 how can I match all the key value pair in python which running too long

User-item affinity and recommendations : I am creating a table which suggests "customers who bought this item also bought algorithm " Input dataset productId userId Prod1 a Prod1 b Prod1 c Prod1 d prod2 b prod2 c prod2 a prod2 b prod3 c prod3 a prod3 d prod3 c prod4 a prod4 b prod4 d prod4 a prod5 d prod5 a Output required Product1 Product2 score Prod1 prod3 Prod1 prod4 Prod1 prod5 prod2 Prod1 prod2 prod3 prod2 prod4 prod2 prod5 prod3 Prod1 prod3 prod2 Using code : #Get list of unique items itemList=list(set(main["productId"].tolist())) #Get count of users userCount=len(set(main["productId"]

Spark - How to create a sparse matrix from item ratings

阅读更多关于 Spark - How to create a sparse matrix from item ratings

问题 My question is equivalent to R-related post Create Sparse Matrix from a data frame, except that I would like to perform the same thing on Spark (preferably in Scala ). Sample of data in the data.txt file from which the sparse matrix is being created: UserID MovieID Rating 2 1 1 3 2 1 4 2 1 6 2 1 7 2 1 So in the end the columns are the movie IDs and the rows are the user IDs 1 2 3 4 5 6 7 1 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 3 0 1 0 0 0 0 0 4 0 1 0 0 0 0 0 5 0 0 0 0 0 0 0 6 0 1 0 0 0 0 0 7 0 1 0 0

How to use mllib.recommendation if the user ids are string instead of contiguous integers?

阅读更多关于 How to use mllib.recommendation if the user ids are string instead of contiguous integers?

I want to use Spark's mllib.recommendation library to build a prototype recommender system. However, the format of the user data I have is something of the following format: AB123XY45678 CD234WZ12345 EF345OOO1234 GH456XY98765 .... If I want to use the mllib.recommendation library, according to the API of the Rating class, the user ids have to be integers (also have to be contiguous?) It looks like some kind of conversion between the real user ids and the numeric ones used by Spark must be done. But how should I do this? Spark don't really require numeric id, it just needs to bee some unique

Collaborative Filtering: Non-Personalized item-to-item similarity

阅读更多关于 Collaborative Filtering: Non-Personalized item-to-item similarity

问题 I'm trying to compute item-to-item similarity along the lines of Amazon's "Customers who viewed/purchased X have also viewed/purchased Y and Z". All of the examples and references I've seen are for either computing item similarity for ranked items, for finding user-user similarity, or for finding recommended items based on the current users' history. I'd like to start off with a non-targeted approach before factoring in the current users' preferences. Looking at the Amazon.com recommendations

How to create my own recommendation engine? [closed]

阅读更多关于 How to create my own recommendation engine? [closed]

I am interested in recommendation engines these days and I want to improve myself in this area. I am currently reading " Programming Collective Intelligence " I think this is the best book about this subject, from O'Reilly. But I don't have any ideas how to implement engine; What I mean by "no idea" is "don't know how to start". I have a project like Last.fm in my mind. Where do (should be implemented on database side or backend side) I start creating recommendation engine? What level of database knowledge will be needed? Is there any open source ones that can be used for help or any resource?

how can I match all the key value pair in python which running too long

阅读更多关于 how can I match all the key value pair in python which running too long

问题 User-item affinity and recommendations : I am creating a table which suggests "customers who bought this item also bought algorithm " Input dataset productId userId Prod1 a Prod1 b Prod1 c Prod1 d prod2 b prod2 c prod2 a prod2 b prod3 c prod3 a prod3 d prod3 c prod4 a prod4 b prod4 d prod4 a prod5 d prod5 a Output required Product1 Product2 score Prod1 prod3 Prod1 prod4 Prod1 prod5 prod2 Prod1 prod2 prod3 prod2 prod4 prod2 prod5 prod3 Prod1 prod3 prod2 Using code : #Get list of unique items

How to build a 'related questions' engine?

阅读更多关于 How to build a 'related questions' engine?

One of our bigger sites has a section where users can send questions to the website owner which get evaluated personally by his staff. When the same question pops up very often they can add this particular question to the Faq. In order to prevent them from receiving dozens of similar questions a day we would like to provide a feature similar to the 'Related questions' on this site (stack overflow). What ways are there to build this kind of feature? I know that i should somehow evaluate the question and compare it to the questions in the faq but how does this comparison work? Are keywords

Get Google Analytics “Visitors Flow” data from API

阅读更多关于 Get Google Analytics “Visitors Flow” data from API

I'm trying to gather information from Google Analytics to build a recommendation engine for my site. The site consists of many pages, so I'm tracking the number of times a user clicks, for example, from page A to page B. Currently I can measure the A -> B transitions on Google Analytics with previousPagePath = '/A' and nextPagePath = '/B' , but the question I really want to answer is, "Of all the visits to the site that included viewing page A, how many times were pages B, C, ... viewed in the same visit?" For example, if the flow was A -> homepage -> B , then that would not be captured by my

Which is the proper way of filtering numeric values for a text field?

阅读更多关于 Which is the proper way of filtering numeric values for a text field?

问题 I'm working on a textfield working with the kind of validation that wouldn't let you enter other than numeric values. As so, my initial code looked quite simple and similar to this: $(textField).onKeyPress(function(e) { if (e.which < 48 && e.which > 57) e.preventDefault(); }); This is fairly strightforward, but turns that (in the latest version of all browsers) Firefox will make this also prevent movement with the arrow keys and delete/backspace keys, whereas the other browsers would not.

How to use mllib.recommendation if the user ids are string instead of contiguous integers?

阅读更多关于 How to use mllib.recommendation if the user ids are string instead of contiguous integers?

问题 I want to use Spark's mllib.recommendation library to build a prototype recommender system. However, the format of the user data I have is something of the following format: AB123XY45678 CD234WZ12345 EF345OOO1234 GH456XY98765 .... If I want to use the mllib.recommendation library, according to the API of the Rating class, the user ids have to be integers (also have to be contiguous?) It looks like some kind of conversion between the real user ids and the numeric ones used by Spark must be