mahout-recommender

Scala - Create IndexedDatasetSpark object

这一生的挚爱 提交于 2020-02-05 02:03:49
问题 I want to run Spark RowSimilarity recommender on data obtained from mongodb. For this purpose, I've written below code which takes input from mongo, converts it to RDD of Objects. This needs to be passed to IndexedDataSetSpark which is then passed to SimilarityAnalysis.rowSimilarityIDS import org.apache.hadoop.conf.Configuration import org.apache.mahout.math.cf.SimilarityAnalysis import org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark import org.apache.spark.rdd.

Multiple models in Myrrix

痴心易碎 提交于 2020-01-15 12:21:46
问题 I have a CSV file like this: typeA,typeB typeA,typeC typeA,typeC typeA,typeB Here, typeA, typeB and typeC are 3 different types of entities. Consider types B and C to be two different types of items and consider type A to be the users. I can build a model by feeding this CSV file into Myrrix. This file has two types only, B (the "B" items from the former CSV file are in here as users) and D. Now, suppose I have another CSV file like this: typeB,typeD typeB,typeD typeB,typeD typeB,typeD Here,

Mahout precomputed Item-item similarity - slow recommendation

筅森魡賤 提交于 2020-01-03 03:45:23
问题 I am having performance issues with precomuted item-item similarities in Mahout. I have 4 million users with roughly the same amount of items, with around 100M user-item preferences. I want to do content-based recommendation based on the Cosine similarity of the TF-IDF vectors of the documents. Since computing this on the fly is slow, I precomputed the pairwise similarity of the top 50 most similar documents as follows: I used seq2sparse to produce TF-IDF vectors. I used mahout rowId to

Run Mahout RowSimilarity recommender on MongoDB data

痴心易碎 提交于 2019-12-25 04:56:30
问题 I have managed to run Mahout rowsimilarity on flat files of below format: item-id tag1 tag-2 tag3 This has to be run via cli and the output is again flat files. I want to make this such that it reads data from MongoDB (open to using other DBs too) and then dumps the output to DB which can then be picked from our system. I've researched for past few days and found below things: Will have to write Scala code implementing RowSimilarity Pass it an IndexedDataSet object to process the data Convert

how can I compile/using mahout for hadoop 2.0?

痞子三分冷 提交于 2019-12-22 08:51:38
问题 The latest release mahout 0.9 is only built on hadoop 1.x. (mvn clean install) How can I compile mahout for hadoop 2.0.x? Because When I was running the commands: hadoop jar mahout-examples-0.9-SNAPSHOT-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURENCE -i test -o result I always got the error message IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected. Thanks! 回答1: To compile Mahout to work with 2.x

Dropwizard Application crashed by AbstractJAXBProvider

戏子无情 提交于 2019-12-14 03:55:17
问题 I have a server application implemented using Dropwizard and Gradle as Build System. Now I want to integrate Apache Mahout for some recommender system action. After adding the Mahout dependency and try to run, I get exceptions. My initial dependencies look like dependencies { compile 'io.dropwizard:dropwizard-core:0.9.1' compile 'io.dropwizard:dropwizard-jdbi:0.9.1' compile 'mysql:mysql-connector-java:5.1.37' compile 'redis.clients:jedis:2.8.0' compile 'com.google.guava:guava:18.0' compile

Model creation for User User collanborative filtering

陌路散爱 提交于 2019-12-14 02:55:41
问题 I want to do a sort of user-user collaborative filtering wherein the users in the user-item matrix are a selected part of whole users in the database. These selected users are refreshed regularly with newly selected users preferences. New users shouldn't be added to the matrix. For a new user, based on his preferences we need to recommend items from the user-item matrix (which has only a part of users which are selected). I do not want to add the new anonymous users to the matrix. Explored in

how to integrate recommender system(python file) to the django project.

一笑奈何 提交于 2019-12-13 06:59:40
问题 This is the python script made by my friend .How to integrate this file in my django project which contains all list of movies taken from the movierulz data set.Where should I integrate this code. import numpy as np import pandas as pd # set some print options np.set_printoptions(precision=4) np.set_printoptions(threshold=5) np.set_printoptions(suppress=True) pd.set_option('precision', 3, 'notebook_repr_html', True, ) # init random gen np.random.seed(2) #users_file = "/media/sourabhkondapaka

Extend Mahout for new dataset

隐身守侯 提交于 2019-12-11 11:45:14
问题 I want to build a recommendation model based on Mahout. My dataset format has extra columns other than userID, itemID, rating and timestamp. Thus, I think I need to extend the FileDataModel. I looked into JesterDataModel as an example. However, I have a problem with the logic flow. In its buildModel() method, an empty map "data" is first constructed. It is then thrown into processFile. I assume that "data" is modified in this method, since later it is used to construct the GenericDataModel

Apache Mahout not giving any recommendation

谁说我不能喝 提交于 2019-12-11 05:25:20
问题 I am trying to use mahout for the recommendation but getting none . My dataset : 0,102,5.0 1,101,5.0 1,102,5.0 Code : DataModel datamodel = new FileDataModel(new File("dataset.csv")); // Creating UserSimilarity object. UserSimilarity usersimilarity = new PearsonCorrelationSimilarity(datamodel); // Creating UserNeighbourHHood object. UserNeighborhood userneighborhood = new ThresholdUserNeighborhood(0.1, usersimilarity, datamodel); // Create UserRecomender UserBasedRecommender recommender = new