Defining DataFrame Schema for a table with 1500 columns in Spark

后端未结

关注

 3  1500

轻奢々 2021-01-23 02:29

I have a table with around 1500 columns in SQL Server. I need to read the data from this table and then convert it to proper datatype format and then insert the records into Ora

3条回答

刺人心 (楼主)

2021-01-23 02:55
For this type of requirements. I'd offer case class approach to prepare a dataframe

Yes, There are some limitations like productarity but we can overcome... you can do like below example for < versions 2.11 :

prepare a case class which extends Product and overrides methods.

like...
- productArity():Int: This returns the size of the attributes. In our case, it's 33. So, our implementation looks like this:
- productElement(n:Int):Any: Given an index, this returns the attribute. As protection, we also have a default case, which throws an IndexOutOfBoundsException exception:
- canEqual (that:Any):Boolean: This is the last of the three functions, and it serves as a boundary condition when an equality check is being done against class:
- Example implementation you can refer this Student case class which has 33 fields in it
- Example student dataset description here
Another option :

Use the StructType to define the schema and create the dataframe.(if you don't want to use spark csv api)
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

Defining DataFrame Schema for a table with 1500 columns in Spark

Another option :