mahout | 易学教程

Is it worth purchasing Mahout in Action to get up to speed with Mahout, or are there other better sources?

阅读更多关于 Is it worth purchasing Mahout in Action to get up to speed with Mahout, or are there other better sources?

问题 I'm currently a very casual user of Apache Mahout, and I'm considering purchasing the book Mahout in Action. Unfortunately, I'm having a really hard time getting an idea of how worth it this book is -- and seeing as it's a Manning Early Access Program book (and therefore only currently available as a beta-version e-book), I can't take a look myself in a bookstore. Can anyone recommend this as a good (or less good) guide to getting up to speed with Mahout, and/or other sources that can

is it possible to use apache mahout without hadoop dependency?

阅读更多关于 is it possible to use apache mahout without hadoop dependency?

问题 Is it possible to use Apache mahout without any dependency to Hadoop. I would like to use the mahout algorithm on a single computer by only including the mahout library inside my Java project but i dont want to use hadoop at all since i will be running on a single node anyway. Is that possible? 回答1: Yes. Not all of Mahout depends on Hadoop, though much does. If you use a piece that depends on Hadoop, of course, you need Hadoop. But for example there is a substantial recommender engine code

How to get k similar products using Mahout?

阅读更多关于 How to get k similar products using Mahout?

问题 I have one product, let's say a book. Now I want to retrieve k products, that are similar to this product. How can I do this with Mahout? The products are stored in a MySQL database so I'd use the JDBCDataModel. For computing the similarities I'd prefer the LogLikelihoodTest. But which recommender should I choose? It seems that all recommenders are designed 回答1: I'm going to guess at the question here. You have user-item data, where users are real people and items are books. You are using

mahout wont start up. Anything to do with compatible version between hadoop and mahout?

阅读更多关于 mahout wont start up. Anything to do with compatible version between hadoop and mahout?

问题 I am new to hadoop and not to say mahout. I hope someone could assist me to get through here.. have been trying for 2 days.. I have already a hadoop cluster running. I am using hadoop-2.0.0-alpha. I installed mahout (ahout-distribution-0.7) and maven-2.2.1 (latest maven-3.0.4 doesnt work) Now i would like to just run mahout to get the idea of what is it. I learnt that by typing "mahout" it will print out a list of options (algorithms) available in mahout, but when i typed mahout, it just

Similarity function for Mahout boolean user-based recommender

阅读更多关于 Similarity function for Mahout boolean user-based recommender

问题 I am using Mahout to build a user-based recommendation system which operates with boolean data. I use GenericBooleanPrefUserBasedRecommender , NearestNUserNeighborhood and now trying to decide about the most suitable user similarity function. It was suggested to use either LogLikelihoodSimilarity or TanimotoCoefficientSimilarity . I tried both and am getting [subjectively evaluated] meaningful results in both cases. However the RMSE rating for the same data set is better the LogLikehood. The

Mahout中相似度计算方法介绍

阅读更多关于 Mahout中相似度计算方法介绍

Mahout中相似度计算方法介绍在现实中广泛使用的推荐系统一般都是基于协同过滤算法的，这类算法通常都需要计算用户与用户或者项目与项目之间的相似度，对于数据量以及数据类型不同的数据源，需要不同的相似度计算方法来提高推荐性能，在mahout提供了大量用于计算相似度的组件，这些组件分别实现了不同的相似度计算方法。下图用于实现相似度计算的组件之间的关系：图1、项目相似度计算组件图2、用户相似度计算组件下面就几个重点相似度计算方法做介绍：皮尔森相关度类名：PearsonCorrelationSimilarity 原理：用来反映两个变量线性相关程度的统计量范围：[-1,1]，绝对值越大，说明相关性越强，负相关对于推荐的意义小。说明：1、不考虑重叠的数量；2、如果只有一项重叠，无法计算相似性（计算过程被除数有n-1）；3、如果重叠的值都相等，也无法计算相似性（标准差为0，做除数）。该相似度并不是最好的选择，也不是最坏的选择，只是因为其容易理解，在早期研究中经常被提起。使用Pearson线性相关系数必须假设数据是成对地从正态分布中取得的，并且数据至少在逻辑范畴内必须是等间距的数据。Mahout中，为皮尔森相关计算提供了一个扩展，通过增加一个枚举类型（Weighting）的参数来使得重叠数也成为计算相似度的影响因子。欧式距离相似度类名

How to get k similar products using Mahout?

阅读更多关于 How to get k similar products using Mahout?

I have one product, let's say a book. Now I want to retrieve k products, that are similar to this product. How can I do this with Mahout? The products are stored in a MySQL database so I'd use the JDBCDataModel. For computing the similarities I'd prefer the LogLikelihoodTest. But which recommender should I choose? It seems that all recommenders are designed I'm going to guess at the question here. You have user-item data, where users are real people and items are books. You are using LogLikelihoodSimilarity as the basis for some recommender, either user-based or item-based. You don't need a

How to run examples in mahout in action book

阅读更多关于 How to run examples in mahout in action book

问题 I am trying to run the hello world example in chapter 7. I created the following in eclipse and then packed it into a jar:- package com.mycode.mahout import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.SequenceFile;

Running Mahout from the command line (CLASSPATH)

阅读更多关于 Running Mahout from the command line (CLASSPATH)

问题 Complied Mahout successfully under Windows using Maven. I'm trying to run one of the examples from the command line and I don't get what I am doing wrong. Seems like a CLASSPATH problem. Let's say I want to run the GroupLensRecommenderEvaluatorRunner example. I go to the folder with the GroupLensRecommenderEvaluatorRunner.class file in it and execute: java -cp C:/mahout/core/target/classes;. org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner It gives me the

Evaluating recommenders - unable to recommend in x cases

阅读更多关于 Evaluating recommenders - unable to recommend in x cases

问题 I'm exploring some of the code examples in Mahout in Action in more detail. I have built a small test that computes the RMS of various algorithms applied to my data. Of course, multiple parameters impact the RMS, but I don't understand the "unable to recommend in ... cases" message that is generated while running an evaluation. Looking at StatsCallable.java, this is generated when an evaluator encounters a NaN response; Perhaps not enough data in the training set or the user's prefs to