mahout | 易学教程

How to read Mahout clustering output

阅读更多关于 How to read Mahout clustering output

问题 I have run the k-Means clustering algorithm on the synthetic control data from the Mahout tutorial, and was wondering if someone could explain how to interpret the output. I ran clusterdump and received output that looks something like this (truncated to save space): CL-592{n=57 c=30.726, 29.813...] r=[3.528, 3.597...]} Weight : [props - optional]: Point: 1.0 : [distance=27.453962995925863]: [24.672, 35.261, 30.486...] 1.0 : [distance=27.675053294846002]: [25.592, 29.951, 34.188...] 1.0 :

How to read Mahout clustering output

阅读更多关于 How to read Mahout clustering output

基于hadoop构建智能推荐系统：第1篇用户行为数据分析并导出到hdfs

阅读更多关于基于hadoop构建智能推荐系统：第1篇用户行为数据分析并导出到hdfs

这个分类我主要想分享基于 hadoop 构建智能推荐系统的过程思路、程序设计和系统架构方面的一些技巧心得，至于 hadoop 、 sqoop 、 hbase 的安装和使用我就不多讲了，网上已经有很多这方面的文章了让我们直奔主题吧。 1 、我们要搭建一个智能推荐系统最重要的是什么呢？不是算法、也不是系统的本身，最关键是准确分析用户的行为数据，最终得出一个用户偏好表。有了这个用户偏好表，我们可以做的事太多了，比如计算用户的相似度、计算物品的相似度、把用户按照行为进行聚类。。。但是这一切的前提都必须有一个“用户偏好表”（如图表 1-1 ）。图表 1-1 用户偏好表 Uid （用户 id ） Itemid （物品 id ） Preference （偏好值） Timestamp （时间戳） 1001 1005 4.5 123278545 1002 1008 3.5 123577865 1001 1008 5.0 123478588 2 、（下面我将以视频网站的智能推荐系统为例子）为了得到这样的一张表，我们首先要对用户的行为进行分析，视频网站的用户行为一般为观看记录、评分记录、顶 / 踩、评论记录。。。我们按照预先设定的权重（图表 2-1 ），将这些行为数据进行简单的权重相加，得出一个比较粗糙的偏好评分。图表 2-1 行为权重（后面的行为记分覆盖前面的行为记分）行为

mahout历史（二）

阅读更多关于 mahout历史（二）

mahout历史　　Apache Mahout起源于 2008年，经过两年的发展， 2010年4月 ApacheMahout最终成为了Apache的顶级项目。Mahout 项目是由 ApacheLucene （开源搜索）社区中对机器学习感兴趣的一些成员发起的，他们希望建立一个可靠、文档翔实、可伸缩的项目，在其中实现一些常见的用于集群和分类的机器学习算法。该社区最初基于 Ng et al. 的文章 “Map-Reduce for MachineLearning on Multicore”，但此后在发展中又并入了更多广泛的机器学习方法。　　Mahout是Apache基金会的开源项目之一。Apache Mahout起源于2008年，当时它是Apache Lucene的子项目。在使用Hadoop云平台的基础上，可以将其功能有效地扩展到Hadoop云平台中，提高其运算效率。 2010年4月，Apache Mahout最终成为了Apache的顶级项目。创建此项目的用意是建立一个可扩容的云平台算法库。目前，Mahout已经实现了多种经典数据挖掘算法，算是比较完备的算法库了。Mahout目前还在扩充中，由世界上对这个项目感兴趣的云平台算法编程高手们一起进行开发、测试，然后进行算法扩充，任何对这个项目感兴趣的个人或者组织都可以加入到该项目的社区中

Scala - Create IndexedDatasetSpark object

阅读更多关于 Scala - Create IndexedDatasetSpark object

问题 I want to run Spark RowSimilarity recommender on data obtained from mongodb. For this purpose, I've written below code which takes input from mongo, converts it to RDD of Objects. This needs to be passed to IndexedDataSetSpark which is then passed to SimilarityAnalysis.rowSimilarityIDS import org.apache.hadoop.conf.Configuration import org.apache.mahout.math.cf.SimilarityAnalysis import org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark import org.apache.spark.rdd.

如何用meavn构建mahout项目

阅读更多关于如何用meavn构建mahout项目

（1）下载meavn 解压到D盘（2）配置环境变量（3）验证（4）安装配置eclipse插件下载： http://download.eclipse.org/technology/m2e/releases/1.5/1.5.1.20150109-1820 安装：eclipse-帮助-安装新软件（记准了，别忘记了。。。。）（5）新建生成mahout项目 maven需要通过Windows下命令行的方式生成新项目先在eclipse的工作空间下新建一个空目录，用来保存mahout项目： E:\eclipse\workspace\mahout 进入cmd： >>cd E:\eclipse\workspace\mahout >> mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=org.chennan.mymahout -DartifactId=myMahout -DpackageName=org.chennan.mymahout -Dversion=1.0-SNAPSHOT -DinteractiveMode=false >> cd myMahout >> mvn clean install 此时项目已经建立（6）打开eclipse

Mahout error with Hadoop2.2

阅读更多关于 Mahout error with Hadoop2.2

问题 I'm trying to execute a mapreduce job for XML parsing using mahout 0.9 library on Hadoop 2.2. But I'm getting following error : 14/02/24 16:03:02 INFO mapreduce.Job: Task Id : attempt_1393235568433_0004_m_000000_0, Status : FAILED Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected 14/02/24 16:03:12 INFO mapreduce.Job: Task Id : attempt_1393235568433_0004_m_000000_1, Status : FAILED Error: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext

Using mahout in java code, not cli

阅读更多关于 Using mahout in java code, not cli

问题 i want to be able to build a model using java, i am able to do so with CLI as folowing: ./mahout trainlogistic --input Candy-Crush.twtr.csv \ --output ./model \ --target hd_click --categories 2 \ --predictors click_frequency country_code ctr device_price_range hd_conversion time_of_day num_clicks phone_type twitter is_weekend app_entertainment app_wallpaper app_widgets arcade books_and_reference brain business cards casual comics communication education entertainment finance game_wallpaper

Error in running movie recommendations by using Apache Mahout with HDInsight

阅读更多关于 Error in running movie recommendations by using Apache Mahout with HDInsight

问题 I ran the following code but receiving an error... # The HDInsight cluster name. $clusterName = "my-cluster-name" Use-AzureHDInsightCluster $clusterName # NOTE: The version number portion of the file path # may change in future versions of HDInsight. # So dynamically grab it using Hive. $mahoutPath = Invoke-Hive -Query '!${env:COMSPEC} /c dir /b /s ${env:MAHOUT_HOME}\examples\target\*-job.jar' | where {$_.startswith("C:\apps\dist")} $mahoutPath = $mahoutPath -replace "\\", "/" $jarFile =

Error in running movie recommendations by using Apache Mahout with HDInsight

阅读更多关于 Error in running movie recommendations by using Apache Mahout with HDInsight