mahout

Entity Extraction/Recognition with free tools while feeding Lucene Index

 ̄綄美尐妖づ 提交于 2019-11-28 13:23:05
问题 I'm currently investigating the options to extract person names, locations, tech words and categories from text (a lot articles from the web) which will then feeded into a Lucene/ElasticSearch index. The additional information is then added as metadata and should increase precision of the search. E.g. when someone queries 'wicket' he should be able to decide whether he means the cricket sport or the Apache project. I tried to implement this on my own with minor success so far. Now I found a

Why vector normalization can improve the accuracy of clustering and classification?

南笙酒味 提交于 2019-11-28 03:24:47
It is described in Mahout in Action that normalization can slightly improve the accuracy. Can anyone explain the reason, thanks! Franck Dernoncourt Normalization is not always required, but it rarely hurts. Some examples: K-means : K-means clustering is "isotropic" in all directions of space and therefore tends to produce more or less round (rather than elongated) clusters. In this situation leaving variances unequal is equivalent to putting more weight on variables with smaller variance. Example in Matlab: X = [randn(100,2)+ones(100,2);... randn(100,2)-ones(100,2)]; % Introduce

mahout从入门到放弃--安装(1)

試著忘記壹切 提交于 2019-11-27 16:32:14
1.稀里糊涂下载 我的集群是hadoop 2.7.3 ,本来想找到对应的mahout版本,但是没有找到。本着安全原则,mahout最新版本是0.14.0,回退一个版本使用0.13.0 mahout地址 2.安装后 一波操作:解压到D:Zoo和配置好环境变量后,运行报错!!! D:\Zoo\apache-mahout-distribution-0.13.0\bin>mahout "===============DEPRECATION WARNING===============" "This script is no longer supported for new drivers as of Mahout 0.10.0" "Mahout's bash script is supported and if someone wants to contribute a fix for this" "it would be appreciated." "Mahout home set D:\Zoo\mahout-0.14.0" "ERROR: Could not find mahout-examples-*.job in D:\Zoo\mahout-0.14.0 or D:\Zoo\mahout-0.14.0/examples/target, please run 'mvn install

How do I build/run this simple Mahout program without getting exceptions?

旧时模样 提交于 2019-11-27 04:01:25
I would like to run this code which I found in Mahout In Action: package org.help; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; import org.apache.mahout.math.DenseVector; import org.apache.mahout.math.NamedVector; import org.apache.mahout.math.VectorWritable; public class SeqPrep { public static void main(String args[]) throws IOException{ List<NamedVector> apples =

Why vector normalization can improve the accuracy of clustering and classification?

对着背影说爱祢 提交于 2019-11-27 00:02:56
问题 It is described in Mahout in Action that normalization can slightly improve the accuracy. Can anyone explain the reason, thanks! 回答1: Normalization is not always required, but it rarely hurts. Some examples: K-means: K-means clustering is "isotropic" in all directions of space and therefore tends to produce more or less round (rather than elongated) clusters. In this situation leaving variances unequal is equivalent to putting more weight on variables with smaller variance. Example in Matlab:

大数据分析挖掘培训课程要点及大纲

让人想犯罪 __ 提交于 2019-11-26 17:27:11
大数据分析挖掘培训课程要点-基于Hadoop/Mahout/Mllib的大数据挖掘 目前对大数据的分析工具,首选的是Hadoop/Yarn平台。Hadoop/Yarn在可伸缩性、健壮性、计算性能和成本上具有无可替代的优势,事实上已成为当前互联网企业主流的大数据分析平台。 一、培训对象 1,系统架构师、系统分析师、高级程序员、资深开发人员。 2,牵涉到大数据处理的数据中心运行、规划、设计负责人。 3,政府机关,金融保险、移动和互联网等大数据来源单位的负责人。 4,高校、科研院所牵涉到大数据与分布式数据处理的项目负责人。 二、学员基础 1,对IT系统设计有一定的理论与实践经验。 2,数据仓库与数据挖掘处理有一定的基础知识。 3,对Hadoop/Yarn/Spark大数据技术有一定的了解。 三、培训要点 本课程从大数据挖掘分析技术实战的角度,结合理论和实践,全方位地介绍Mahout和 MLlib等大数据挖掘工具的开发技巧。本课程涉及的主题包括:大数据挖掘及其背景,Mahout和 MLlib大数据挖掘工具,推荐系统及电影推荐案例,分类技术及聚类分析,以及与流挖掘和Docker技术的结合,分析了大数据挖掘前景分析。 本课程教学过程中还提供了案例分析来帮助学员了解如何用Mahout和 MLlib挖掘工具来解决具体的问题,并介绍了从大数据中挖掘出有价值的信息的关键。 本课程不是一个泛泛的理论性

How do I build/run this simple Mahout program without getting exceptions?

大兔子大兔子 提交于 2019-11-26 10:59:39
问题 I would like to run this code which I found in Mahout In Action: package org.help; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; import org.apache.mahout.math.DenseVector; import org.apache.mahout.math.NamedVector; import org.apache.mahout.math.VectorWritable; public