mahout-recommender | 易学教程

Scala - Create IndexedDatasetSpark object

阅读更多关于 Scala - Create IndexedDatasetSpark object

问题 I want to run Spark RowSimilarity recommender on data obtained from mongodb. For this purpose, I've written below code which takes input from mongo, converts it to RDD of Objects. This needs to be passed to IndexedDataSetSpark which is then passed to SimilarityAnalysis.rowSimilarityIDS import org.apache.hadoop.conf.Configuration import org.apache.mahout.math.cf.SimilarityAnalysis import org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark import org.apache.spark.rdd.

Multiple models in Myrrix

阅读更多关于 Multiple models in Myrrix

问题 I have a CSV file like this: typeA,typeB typeA,typeC typeA,typeC typeA,typeB Here, typeA, typeB and typeC are 3 different types of entities. Consider types B and C to be two different types of items and consider type A to be the users. I can build a model by feeding this CSV file into Myrrix. This file has two types only, B (the "B" items from the former CSV file are in here as users) and D. Now, suppose I have another CSV file like this: typeB,typeD typeB,typeD typeB,typeD typeB,typeD Here,

Mahout precomputed Item-item similarity - slow recommendation

阅读更多关于 Mahout precomputed Item-item similarity - slow recommendation

问题 I am having performance issues with precomuted item-item similarities in Mahout. I have 4 million users with roughly the same amount of items, with around 100M user-item preferences. I want to do content-based recommendation based on the Cosine similarity of the TF-IDF vectors of the documents. Since computing this on the fly is slow, I precomputed the pairwise similarity of the top 50 most similar documents as follows: I used seq2sparse to produce TF-IDF vectors. I used mahout rowId to

Run Mahout RowSimilarity recommender on MongoDB data

阅读更多关于 Run Mahout RowSimilarity recommender on MongoDB data

问题 I have managed to run Mahout rowsimilarity on flat files of below format: item-id tag1 tag-2 tag3 This has to be run via cli and the output is again flat files. I want to make this such that it reads data from MongoDB (open to using other DBs too) and then dumps the output to DB which can then be picked from our system. I've researched for past few days and found below things: Will have to write Scala code implementing RowSimilarity Pass it an IndexedDataSet object to process the data Convert

how can I compile/using mahout for hadoop 2.0?

阅读更多关于 how can I compile/using mahout for hadoop 2.0?

问题 The latest release mahout 0.9 is only built on hadoop 1.x. (mvn clean install) How can I compile mahout for hadoop 2.0.x? Because When I was running the commands: hadoop jar mahout-examples-0.9-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURENCE -i test -o result I always got the error message IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected. Thanks! 回答1: To compile Mahout to work with 2.x

Dropwizard Application crashed by AbstractJAXBProvider

阅读更多关于 Dropwizard Application crashed by AbstractJAXBProvider

问题 I have a server application implemented using Dropwizard and Gradle as Build System. Now I want to integrate Apache Mahout for some recommender system action. After adding the Mahout dependency and try to run, I get exceptions. My initial dependencies look like dependencies { compile 'io.dropwizard:dropwizard-core:0.9.1' compile 'io.dropwizard:dropwizard-jdbi:0.9.1' compile 'mysql:mysql-connector-java:5.1.37' compile 'redis.clients:jedis:2.8.0' compile 'com.google.guava:guava:18.0' compile

Model creation for User User collanborative filtering

阅读更多关于 Model creation for User User collanborative filtering

问题 I want to do a sort of user-user collaborative filtering wherein the users in the user-item matrix are a selected part of whole users in the database. These selected users are refreshed regularly with newly selected users preferences. New users shouldn't be added to the matrix. For a new user, based on his preferences we need to recommend items from the user-item matrix (which has only a part of users which are selected). I do not want to add the new anonymous users to the matrix. Explored in

how to integrate recommender system(python file) to the django project.

阅读更多关于 how to integrate recommender system(python file) to the django project.

问题 This is the python script made by my friend .How to integrate this file in my django project which contains all list of movies taken from the movierulz data set.Where should I integrate this code. import numpy as np import pandas as pd # set some print options np.set_printoptions(precision=4) np.set_printoptions(threshold=5) np.set_printoptions(suppress=True) pd.set_option('precision', 3, 'notebook_repr_html', True, ) # init random gen np.random.seed(2) #users_file = "/media/sourabhkondapaka

Extend Mahout for new dataset

阅读更多关于 Extend Mahout for new dataset

问题 I want to build a recommendation model based on Mahout. My dataset format has extra columns other than userID, itemID, rating and timestamp. Thus, I think I need to extend the FileDataModel. I looked into JesterDataModel as an example. However, I have a problem with the logic flow. In its buildModel() method, an empty map "data" is first constructed. It is then thrown into processFile. I assume that "data" is modified in this method, since later it is used to construct the GenericDataModel

Apache Mahout not giving any recommendation

阅读更多关于 Apache Mahout not giving any recommendation

问题 I am trying to use mahout for the recommendation but getting none . My dataset : 0,102,5.0 1,101,5.0 1,102,5.0 Code : DataModel datamodel = new FileDataModel(new File("dataset.csv")); // Creating UserSimilarity object. UserSimilarity usersimilarity = new PearsonCorrelationSimilarity(datamodel); // Creating UserNeighbourHHood object. UserNeighborhood userneighborhood = new ThresholdUserNeighborhood(0.1, usersimilarity, datamodel); // Create UserRecomender UserBasedRecommender recommender = new