mahout

Web page recommender system

做~自己de王妃 提交于 2019-12-02 19:49:27
I am trying to build a recommender system which would recommend webpages to the user based on his actions(google search, clicks, he can also explicitly rate webpages). To get an idea the way google news does it, it displays news articles from the web on a particular topic. In technical terms that is clustering, but my aim is similar. It will be content based recommendation based on user's action. So my questions are: How can I possibly trawl the internet to find related web-pages? And what algorithm should I use to extract data from web-page is textual analysis and word frequency the only way

Recommendation Engines for Java applications [closed]

痞子三分冷 提交于 2019-12-02 18:29:51
I was wondering if there is any open source recommendation engine available? It should suggest something like Amazon and Netflix. I have heard of a framework called Apache Mahout - Taste . I am trying it next week. It would be great if you can share your valuable thoughts. Sean Owen I'm the developer of Mahout / Taste , and hope it will do what you need, but in the interest of balanced coverage, let me also point you at: Duine CoFE Cofi Apache Mahout is the only one I have found for this area (I have been looking recently too). Though Weka may also be an option. I had to work with open source

What's difference between item-based and content-based collaborative filtering?

谁说胖子不能爱 提交于 2019-12-02 15:40:21
I am puzzled about what the item-based recommendation is, as described in the book " Mahout in Action ". There is the algorithm in the book: for every item i that u has no preference for yet for every item j that u has a preference for compute a similarity s between i and j add u's preference for j, weighted by s, to a running average return the top items, ranked by weighted average How can I calculate the similarity between items? If using the content, isn't it a content-based recommendation? Item-Based Collaborative Filtering The original Item-based recommendation is totally based on user

How to solve “log4j:WARN No appenders could be found for logger” error on Twenty Newsgroups Classification Example

不问归期 提交于 2019-12-02 13:31:30
I am trying to run the 2newsgroup classification example in Mahout. I have set MAHOUT_LOCAL=true, the classifier doesn't display the Confusion matrix and gives the following warnings : ok. You chose 1 and we'll use cnaivebayes creating work directory at /tmp/mahout-work-cloudera + echo 'Preparing 20newsgroups data' Preparing 20newsgroups data + rm -rf /tmp/mahout-work-cloudera/20news-all + mkdir /tmp/mahout-work-cloudera/20news-all + cp -R /tmp/mahout-work-cloudera/20news-bydate/20news-bydate-test/alt.atheism /tmp/mahout-work-cloudera/20news-bydate/20news-bydate-test/comp.graphics /tmp/mahout

从源代码剖析Mahout推荐引擎

风格不统一 提交于 2019-12-02 08:46:09
从源代码剖析Mahout推荐引擎 Hadoop家族系列文章 , 主要介绍Hadoop家族产品,常用的项目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的项目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。 从2011年开始,中国进入大数据风起云涌的时代,以Hadoop为代表的家族软件,占据了大数据处理的广阔地盘。开源界及厂商,所有数据软件,无 一不向Hadoop靠拢。Hadoop也从小众的高富帅领域,变成了大数据开发的标准。在Hadoop原有技术基础之上,出现了Hadoop家族产品,通 过“大数据”概念不断创新,推出科技进步。 作为IT界的开发人员,我们也要跟上节奏,抓住机遇,跟着Hadoop一起雄起! 关于作者: 张丹(Conan), 程序员Java,R,PHP,Javascript weibo:@Conan_Z blog: http://blog.fens.me email: bsspirit @gmail.com 转载请注明出处: http://blog.fens.me/mahout-recommend-engine/ 前言 Mahout框架中cf

Mahout之Taste Webapp实战

ⅰ亾dé卋堺 提交于 2019-12-02 08:45:56
Apache Mahout 是 Apache Software Foundation(ASF) 旗下的一个开源项目,提供一些可扩展的机器学习领域经典算法的实现,旨在帮助开发人员更加方便快捷地创建智能应用程序。经典算法包括聚类、分类、协同过滤、进化编程等等,并且,在 Mahout还支持在Hadoop集群中运行,使这些算法可以更高效的运行在云计算环境中。 目前Mahout已经发布的最高版本是0.9。在这里 https://cwiki.apache.org/confluence/display/MAHOUT/BuildingMahout 可以找到Mahout的下载路径,可以下载0.9版本的源码压缩包(mahout-distribution-0.9-src.tar.gz),也可以从svn中co主干代码。 后面的介绍都是以0.9版本的源代码包为基础做的介绍。 前期准备:安装maven( http://my.oschina.net/MrMichael/blog/283125 )。 1.下载代码后,解压。 tar -xvf mahout-distribution-0.9-src.tar.gz (暂时此段无用) http://seanhe.iteye.com/blog/1124682 然后命令行进入mahout-distribution-0.9目录执行 mvn -DskipTests

Candidate Strategy for GenericUserBasedRecommender in Mahout

帅比萌擦擦* 提交于 2019-12-02 07:21:40
In mahout you can define a CandidateItemsStrategy for GenericItemBasedRecommender such that specific items e.g. of a certain category are excluded. When using a GenericUserBasedRecommender this is not possible. How can I accomplish this with GenericUserBasedRecommender ? Is the only way to do this using a IDRescorer ? If possible I'd like to avoid using a IDRescorer . Thank you for your help! [Edit] For the item based recommender I do it like this: private final class OnlySpecificlItemsStrategy implements CandidateItemsStrategy { private final JpaDataModel dataModel; public

Candidate Strategy for GenericUserBasedRecommender in Mahout

不问归期 提交于 2019-12-02 06:47:27
问题 In mahout you can define a CandidateItemsStrategy for GenericItemBasedRecommender such that specific items e.g. of a certain category are excluded. When using a GenericUserBasedRecommender this is not possible. How can I accomplish this with GenericUserBasedRecommender ? Is the only way to do this using a IDRescorer ? If possible I'd like to avoid using a IDRescorer . Thank you for your help! [Edit] For the item based recommender I do it like this: private final class

Mahout rowSimilarity

半腔热情 提交于 2019-12-02 02:43:34
I am trying to compute row similarity between wikipedia documents. I have the tf-idf vectors in format Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.mahout.math.VectorWritable . I am following the quick tour of text analysis from here: https://cwiki.apache.org/confluence/display/MAHOUT/Quick+tour+of+text+analysis+using+the+Mahout+command+line I created a mahout matrix as follows: mahout rowid \ -i wikipedia-vectors/tfidf-vectors/part-r-00000 -o wikipedia-matrix I got the the number generated rows and columns: vectors.RowIdJob: Wrote out matrix with 4587604 rows and

How can I use Mahout's sequencefile API code?

喜欢而已 提交于 2019-12-01 08:06:45
There exists in Mahout a command for create sequence file as bin/mahout seqdirectory -c UTF-8 -i <input address> -o <output address> . I want use this command as code API. Julian Ortega You can do something like this: import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Path outputPath = new Path("c:\\temp"); Text key = new Text(); // Example, this can be another type of