How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?

后端 未结 5 1527
自闭症患者
自闭症患者 2020-11-28 08:13

I have a large Excel(xlsx and xls) file with multiple sheet and I need convert it to RDD or Dataframe so that it can be joined to othe

5条回答
  •  无人及你
    2020-11-28 09:05

    I have used com.crealytics.spark.excel-0.11 version jar and created in spark-Java, it would be the same in scala too, just need to change javaSparkContext to SparkContext.

    tempTable = new SQLContext(javaSparkContxt).read()
        .format("com.crealytics.spark.excel") 
        .option("sheetName", "sheet1")
        .option("useHeader", "false") // Required 
        .option("treatEmptyValuesAsNulls","false") // Optional, default: true 
        .option("inferSchema", "false") //Optional, default: false 
        .option("addColorColumns", "false") //Required
        .option("timestampFormat", "MM-dd-yyyy HH:mm:ss") // Optional, default: yyyy-mm-dd hh:mm:ss[.fffffffff] .schema(schema)
        .schema(schema)
        .load("hdfs://localhost:8020/user/tester/my.xlsx");
    

提交回复
热议问题