Is it possible to force schema definition when loading tables from AWS RDS (MySQL)

最后都变了- 提交于 2021-01-27 16:45:37

问题


I'm using Apache Spark to read data from MySQL database from AWS RDS.

It is actually inferring the schema from the database as well. Unfortunately, one of the table's columns is of type TINYINT(1) (column name : active). The active column has the following values:

  • non active
  • active
  • pending
  • etc.

Spark recognizes TINYINT(1) as BooleanType. So he change all value in active to true or false. As a result, I can’t identify the value.

Is it possible to force schema definition when loading tables to spark?


回答1:


It's not spark that converts the TINYINT type into a boolean but the j-connector used under the hood.

So, actually you don't need to specify a schema for that issue. Because what's actually causing this is the jdbc driver that treats the datatype TINYINT(1) as the BIT type (because the server silently converts BIT -> TINYINT(1) when creating tables).

You can check all the tips and gotchas of the jdbc connector in the MySQL official Connector/J Configuration Properties guide.

You just need to pass the right parameters for your jdbc connector by adding the following to your url connection :

val newUrl = s"$oldUrl&tinyInt1isBit=false"

val data = spark.read.format("jdbc")
  .option("url", newUrl)
  // your other jdbc options
  .load



回答2:


You can define a schema, and use it when you read using

spark.read.schema(Schema)

Spark docs

Example how to define a schema:

// The schema is encoded in a string
val schemaString = "name age"

// Generate the schema based on the string of schema
val fields = schemaString.split(" ")
  .map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields)

Now you can read the data using the predefined schema:

spark.read.schema(schema).function_to_read_data


来源:https://stackoverflow.com/questions/42480888/is-it-possible-to-force-schema-definition-when-loading-tables-from-aws-rds-mysq

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!