I\'m a new user of SparkR. I\'m trying to load a csv file into R using SparkR.
Sys.setenv(SPARK_HOME=\"/usr/local/bin/spark-1.5.1-bin-hadoop2.6\")
.libPaths(
Pre-built Spark distributions are still built with Scala 2.10, not 2.11. So, if you use such a distribution (which I think you do), you need also a spark-csv
build that is for Scala 2.10, not for Scala 2.11 (as the one you use in your code). The following code should then work fine:
library(rJava)
library(SparkR)
library(nycflights13)
df <- flights[1:4, 1:4]
df
year month day dep_time
1 2013 1 1 517
2 2013 1 1 533
3 2013 1 1 542
4 2013 1 1 544
write.csv(df, file="~/scripts/temp.csv", quote=FALSE, row.names=FALSE)
sc <- sparkR.init(sparkHome= "/usr/local/bin/spark-1.5.1-bin-hadoop2.6/",
master="local",
sparkPackages="com.databricks:spark-csv_2.10:1.2.0") # 2.10 here
sqlContext <- sparkRSQL.init(sc)
df_spark <- read.df(sqlContext, "/home/vagrant/scripts/temp.csv", "com.databricks.spark.csv", header="true")
head(df_spark)
year month day dep_time
1 2013 1 1 517
2 2013 1 1 533
3 2013 1 1 542
4 2013 1 1 544