apache-spark-mllib | 易学教程

How to improve my recommendation result? I am using spark ALS implicit

阅读更多关于 How to improve my recommendation result? I am using spark ALS implicit

问题 First, I have some use history of user's app. For example: user1, app1, 3(launch times) user2, app2, 2(launch times) user3, app1, 1(launch times) I have basically two demands: Recommend some app for every user. Recommend similar app for every app. So I use ALS(implicit) of MLLib on spark to implement it. At first, I just use the original data to train the model. The result is terrible. I think it may caused by the range of launch times. And the launch time range from 1 to thousands. So I

DBSCAN on spark : which implementation

阅读更多关于 DBSCAN on spark : which implementation

问题 I would like to do some DBSCAN on Spark. I have currently found 2 implementations: https://github.com/irvingc/dbscan-on-spark https://github.com/alitouka/spark_dbscan I have tested the first one with the sbt configuration given in its github but: functions in the jar are not the same as those in the doc or in the source on github. For example, I cannot find the train function in the jar I manage to run a test with the fit function (found in the jar) but a bad configuration of epsilon (a

Error ExecutorLostFailure when running a task in Spark

阅读更多关于 Error ExecutorLostFailure when running a task in Spark

问题 when I am trying to run it on this folder it is throwing me ExecutorLostFailure everytime Hi I am a beginner in Spark. I am trying to run a job on Spark 1.4.1 with 8 slave nodes with 11.7 GB memory each 3.2 GB Disk . I am running the Spark task from one of the slave node (from 8 nodes) (so with 0.7 storage fraction approx 4.8 gb only is available on each node )and using Mesos as the Cluster Manager. I am using this configuration : spark.master mesos://uc1f-bioinfocloud-vamp-m-1:5050 spark

Mllib dependency error

阅读更多关于 Mllib dependency error

I'm trying to build a very simple scala standalone app using the Mllib, but I get the following error when trying to bulid the program: Object Mllib is not a member of package org.apache.spark Then, I realized that I have to add Mllib as dependency as follow : version := "1" scalaVersion :="2.10.4" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.1.0", "org.apache.spark" %% "spark-mllib" % "1.1.0" ) But, here I got an error that says : unresolved dependency spark-core_2.10.4;1.1.1 : not found so I had to modify it to "org.apache.spark" % "spark-core_2.10" % "1.1.1", But

What is the right way to save\\load models in Spark\\PySpark

阅读更多关于 What is the right way to save\\load models in Spark\\PySpark

I'm working with Spark 1.3.0 using PySpark and MLlib and I need to save and load my models. I use code like this (taken from the official documentation ) from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating data = sc.textFile("data/mllib/als/test.data") ratings = data.map(lambda l: l.split(',')).map(lambda l: Rating(int(l[0]), int(l[1]), float(l[2]))) rank = 10 numIterations = 20 model = ALS.train(ratings, rank, numIterations) testdata = ratings.map(lambda p: (p[0], p[1])) predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2])) predictions.collect

Error ExecutorLostFailure when running a task in Spark

阅读更多关于 Error ExecutorLostFailure when running a task in Spark

when I am trying to run it on this folder it is throwing me ExecutorLostFailure everytime Hi I am a beginner in Spark. I am trying to run a job on Spark 1.4.1 with 8 slave nodes with 11.7 GB memory each 3.2 GB Disk . I am running the Spark task from one of the slave node (from 8 nodes) (so with 0.7 storage fraction approx 4.8 gb only is available on each node )and using Mesos as the Cluster Manager. I am using this configuration : spark.master mesos://uc1f-bioinfocloud-vamp-m-1:5050 spark.eventLog.enabled true spark.driver.memory 6g spark.storage.memoryFraction 0.7 spark.core.connection.ack

What is rank in ALS machine Learning Algorithm in Apache Spark Mllib

阅读更多关于 What is rank in ALS machine Learning Algorithm in Apache Spark Mllib

I Wanted to try an example of ALS machine learning algorithm. And my code works fine, However I do not understand parameter rank used in algorithm. I have following code in java // Build the recommendation model using ALS int rank = 10; int numIterations = 10; MatrixFactorizationModel model = ALS.train(JavaRDD.toRDD(ratings), rank, numIterations, 0.01); I have read some where that it is the number of latent factors in the model. Suppose I have a dataset of (user,product,rating) that has 100 rows. What value should be of rank (latent factors). As you said the rank refers the presumed latent or

The value of “spark.yarn.executor.memoryOverhead” setting?

阅读更多关于 The value of “spark.yarn.executor.memoryOverhead” setting?

The value of spark.yarn.executor.memoryOverhead in a Spark job with YARN should be allocated to App or just the max value? spark.yarn.executor.memoryOverhead Is just the max value .The goal is to calculate OVERHEAD as a percentage of real executor memory, as used by RDDs and DataFrames --executor-memory/spark.executor.memory controls the executor heap size, but JVMs can also use some memory off heap, for example for interned Strings and direct byte buffers. The value of the spark.yarn.executor.memoryOverhead property is added to the executor memory to determine the full memory request to YARN

How to improve my recommendation result? I am using spark ALS implicit

阅读更多关于 How to improve my recommendation result? I am using spark ALS implicit

First, I have some use history of user's app. For example: user1, app1, 3(launch times) user2, app2, 2(launch times) user3, app1, 1(launch times) I have basically two demands: Recommend some app for every user. Recommend similar app for every app. So I use ALS(implicit) of MLLib on spark to implement it. At first, I just use the original data to train the model. The result is terrible. I think it may caused by the range of launch times. And the launch time range from 1 to thousands. So I process the original data. I think score can reflect the true situation and more regularization. score = lt

Using CategoricalFeaturesInfo with DecisionTreeClassifier method in Spark

阅读更多关于 Using CategoricalFeaturesInfo with DecisionTreeClassifier method in Spark

问题 I have to use this code: val dt = new DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setImpurity(impurity).setMaxBins(maxBins).setMaxDepth(maxDepth); I need to add categorical features information so that the decision tree doesn't treat the indexedCategoricalFeatures as numerical. I have this map: val categoricalFeaturesInfo = Map(143 -> 126, 144 -> 5, 145 -> 216, 146 -> 100, 147 -> 14, 148 -> 8, 149 -> 19, 150 -> 7); However it only works with