Defining DataFrame Schema for a table with 1500 columns in Spark

后端 未结 3 1500
轻奢々
轻奢々 2021-01-23 02:29

I have a table with around 1500 columns in SQL Server. I need to read the data from this table and then convert it to proper datatype format and then insert the records into Ora

3条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-23 02:55

    For this type of requirements. I'd offer case class approach to prepare a dataframe

    Yes, There are some limitations like productarity but we can overcome... you can do like below example for < versions 2.11 :

    prepare a case class which extends Product and overrides methods.

    like...

    • productArity():Int: This returns the size of the attributes. In our case, it's 33. So, our implementation looks like this:

    • productElement(n:Int):Any: Given an index, this returns the attribute. As protection, we also have a default case, which throws an IndexOutOfBoundsException exception:

    • canEqual (that:Any):Boolean: This is the last of the three functions, and it serves as a boundary condition when an equality check is being done against class:


    • Example implementation you can refer this Student case class which has 33 fields in it
    • Example student dataset description here

    Another option :

    Use the StructType to define the schema and create the dataframe.(if you don't want to use spark csv api)

提交回复
热议问题