apache-spark-mllib

How to make VectorAssembler do not compress data?

僤鯓⒐⒋嵵緔 提交于 2021-01-28 05:32:51
问题 I want to transform multiple columns to one column using VectorAssembler ,but the data is compressed by default without other options. val arr2= Array((1,2,0,0,0),(1,2,3,0,0),(1,2,4,5,0),(1,2,2,5,6)) val df=sc.parallelize(arr2).toDF("a","b","c","e","f") val colNames=Array("a","b","c","e","f") val assembler = new VectorAssembler() .setInputCols(colNames) .setOutputCol("newCol") val transDF= assembler.transform(df).select(col("newCol")) transDF.show(false) The input is: +---+---+---+---+---+ |

How to run Spark locally on Windows using eclipse in java

无人久伴 提交于 2021-01-28 03:21:34
问题 I'm trying to test Mllib's implementation of SVM. I want to run their java example locally on windows, using eclipse. I've downloaded Spark 1.3.1 pre-built for Hadoop 2.6 . When i try to run the example code, i get: 15/06/11 16:17:09 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. What should i change in order to be able to run the example code in this setup? 回答1: Create

Not serialazable exception while running Linear regression scala 2.12

大憨熊 提交于 2020-12-23 18:33:30
问题 While running the following spark mllib on local mode with scala 2.12.3 , encountered the following error lambda not serialazable Any inputs would be much appreciated ? (Moving onto scala 2.11 is not an option for me) Can you please let me know what can i do to avoid this issue? Thankyou import java.io.FileWriter import org.apache.spark.SparkConf import org.apache.spark.ml.Pipeline import org.apache.spark.ml.evaluation.RegressionEvaluator import org.apache.spark.ml.feature.StringIndexer

Not serialazable exception while running Linear regression scala 2.12

生来就可爱ヽ(ⅴ<●) 提交于 2020-12-23 18:32:27
问题 While running the following spark mllib on local mode with scala 2.12.3 , encountered the following error lambda not serialazable Any inputs would be much appreciated ? (Moving onto scala 2.11 is not an option for me) Can you please let me know what can i do to avoid this issue? Thankyou import java.io.FileWriter import org.apache.spark.SparkConf import org.apache.spark.ml.Pipeline import org.apache.spark.ml.evaluation.RegressionEvaluator import org.apache.spark.ml.feature.StringIndexer

Not serialazable exception while running Linear regression scala 2.12

流过昼夜 提交于 2020-12-23 18:30:51
问题 While running the following spark mllib on local mode with scala 2.12.3 , encountered the following error lambda not serialazable Any inputs would be much appreciated ? (Moving onto scala 2.11 is not an option for me) Can you please let me know what can i do to avoid this issue? Thankyou import java.io.FileWriter import org.apache.spark.SparkConf import org.apache.spark.ml.Pipeline import org.apache.spark.ml.evaluation.RegressionEvaluator import org.apache.spark.ml.feature.StringIndexer

Not serialazable exception while running Linear regression scala 2.12

你说的曾经没有我的故事 提交于 2020-12-23 18:30:46
问题 While running the following spark mllib on local mode with scala 2.12.3 , encountered the following error lambda not serialazable Any inputs would be much appreciated ? (Moving onto scala 2.11 is not an option for me) Can you please let me know what can i do to avoid this issue? Thankyou import java.io.FileWriter import org.apache.spark.SparkConf import org.apache.spark.ml.Pipeline import org.apache.spark.ml.evaluation.RegressionEvaluator import org.apache.spark.ml.feature.StringIndexer

sc is not defined in SparkContext

假如想象 提交于 2020-12-13 03:17:31
问题 My Spark package is spark-2.2.0-bin-hadoop2.7. I exported spark variables as export SPARK_HOME=/home/harry/spark-2.2.0-bin-hadoop2.7 export PATH=$SPARK_HOME/bin:$PATH I opened spark notebook by pyspark I am able to load packages from spark from pyspark import SparkContext, SQLContext from pyspark.ml.regression import LinearRegression print(SQLContext) output is <class 'pyspark.sql.context.SQLContext'> But my error is print(sc) "sc is undefined" plz can anyone help me out ...! 回答1: In