How do you load csv file into SparkR on RStudio? Below are the steps I had to perform to run SparkR on RStudio. I have used read.df to read .csv not sure how else to write t
I successfully solve this issue by providing the commons-csv-1.2.jar together with the spark-csv package.
Apparently, spark-csv uses commons-csv but is not package with it.
Using the following SPARKR_SUBMIT_ARGS solved the issue (I use --jars rather than --packages).
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--jars" "/usr/lib/spark-1.5.1-bin-hadoop2.6/lib/spark-csv_2.11-1.2.0.jar,/usr/lib/spark-1.5.1-bin-hadoop2.6/lib/commons-csv-1.2.jar" "sparkr-shell"')
In fact, the rather obscure error
Error in writeJobj(con, object) : invalid jobj 1
Is more clear using the R shell directly instead from R Studio and clearly state
java.lang.NoClassDefFoundError: org/apache/commons/csv/CSVFormat
The needed commons-csv jar can be found here : https://commons.apache.org/proper/commons-csv/download_csv.cgi