How to load csv file into SparkR on RStudio?

后端 未结 3 838
梦谈多话
梦谈多话 2020-12-17 03:22

How do you load csv file into SparkR on RStudio? Below are the steps I had to perform to run SparkR on RStudio. I have used read.df to read .csv not sure how else to write t

相关标签:
3条回答
  • 2020-12-17 03:32

    I appreciate everyone's input and solutions!!! I figured out another way to load .csv file into SparkR RStudio. Here it is:

    #set sc
    sc <- sparkR.init(master = "local")
    sqlContext <- sparkRSQL.init(sc)
    
    #load .csv 
    patients <- read.csv("C:/...") #Insert your .csv file path
    
    df <- createDataFrame(sqlContext, patients)
    df
    head(df)
    str(df)
    
    0 讨论(0)
  • 2020-12-17 03:36

    I successfully solve this issue by providing the commons-csv-1.2.jar together with the spark-csv package.

    Apparently, spark-csv uses commons-csv but is not package with it.

    Using the following SPARKR_SUBMIT_ARGS solved the issue (I use --jars rather than --packages).

    Sys.setenv('SPARKR_SUBMIT_ARGS'='"--jars" "/usr/lib/spark-1.5.1-bin-hadoop2.6/lib/spark-csv_2.11-1.2.0.jar,/usr/lib/spark-1.5.1-bin-hadoop2.6/lib/commons-csv-1.2.jar" "sparkr-shell"')
    

    In fact, the rather obscure error

    Error in writeJobj(con, object) : invalid jobj 1
    

    Is more clear using the R shell directly instead from R Studio and clearly state

    java.lang.NoClassDefFoundError: org/apache/commons/csv/CSVFormat
    

    The needed commons-csv jar can be found here : https://commons.apache.org/proper/commons-csv/download_csv.cgi

    0 讨论(0)
  • 2020-12-17 03:49

    Spark 2.0.0+:

    You can use csv data source:

    loadDF(sqlContext, path="some_path", source="csv", header="true")
    

    without loading spark-csv.

    Original answer:

    As far as I can tell you're using a wrong version of spark-csv. Pre-built versions of Spark are using Scala 2.10, but you're using Spark CSV for Scala 2.11. Try this instead:

    sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.10:1.2.0")
    
    0 讨论(0)
提交回复
热议问题