Empty output when reading a csv file into Rstudio using SparkR

后端 未结 1 1594
花落未央
花落未央 2020-12-11 09:05

I\'m a new user of SparkR. I\'m trying to load a csv file into R using SparkR.

Sys.setenv(SPARK_HOME=\"/usr/local/bin/spark-1.5.1-bin-hadoop2.6\")
.libPaths(         


        
相关标签:
1条回答
  • 2020-12-11 09:36

    Pre-built Spark distributions are still built with Scala 2.10, not 2.11. So, if you use such a distribution (which I think you do), you need also a spark-csv build that is for Scala 2.10, not for Scala 2.11 (as the one you use in your code). The following code should then work fine:

     library(rJava)
     library(SparkR)
     library(nycflights13)
    
     df <- flights[1:4, 1:4]
     df
       year month day dep_time
     1 2013     1   1      517
     2 2013     1   1      533
     3 2013     1   1      542
     4 2013     1   1      544
    
     write.csv(df, file="~/scripts/temp.csv", quote=FALSE, row.names=FALSE)
    
     sc <- sparkR.init(sparkHome= "/usr/local/bin/spark-1.5.1-bin-hadoop2.6/", 
                       master="local",
                       sparkPackages="com.databricks:spark-csv_2.10:1.2.0")  # 2.10 here
     sqlContext <- sparkRSQL.init(sc)
     df_spark <- read.df(sqlContext, "/home/vagrant/scripts/temp.csv", "com.databricks.spark.csv", header="true")
     head(df_spark)
       year month day dep_time
     1 2013     1   1      517
     2 2013     1   1      533
     3 2013     1   1      542
     4 2013     1   1      544
    
    0 讨论(0)
提交回复
热议问题