How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?

后端 未结 5 1525
自闭症患者
自闭症患者 2020-11-28 08:13

I have a large Excel(xlsx and xls) file with multiple sheet and I need convert it to RDD or Dataframe so that it can be joined to othe

5条回答
  •  暗喜
    暗喜 (楼主)
    2020-11-28 08:44

    The solution to your problem is to use Spark Excel dependency in your project.

    Spark Excel has flexible options to play with.

    I have tested the following code to read from excel and convert it to dataframe and it just works perfect

    def readExcel(file: String): DataFrame = sqlContext.read
        .format("com.crealytics.spark.excel")
        .option("location", file)
        .option("useHeader", "true")
        .option("treatEmptyValuesAsNulls", "true")
        .option("inferSchema", "true")
        .option("addColorColumns", "False")
        .load()
    
    val data = readExcel("path to your excel file")
    
    data.show(false)
    

    you can give sheetname as option if your excel sheet has multiple sheets

    .option("sheetName", "Sheet2")
    

    I hope its helpful

提交回复
热议问题