How to create AWS Glue table where partitions have different columns? ('HIVE_PARTITION_SCHEMA_MISMATCH')

天大地大妈咪最大 提交于 2019-11-29 01:42:24

问题


As per this AWS Forum Thread, does anyone know how to use AWS Glue to create an AWS Athena table whose partitions contain different schemas (in this case different subsets of columns from the table schema)?

At the moment, when I run the crawler over this data and then make a query in Athena, I get the error 'HIVE_PARTITION_SCHEMA_MISMATCH'

My use case is:

  • Partitions represent days
  • Files represent events
  • Each event is a json blob in a single s3 file
  • An event contains a subset of columns (dependent on the type of event)
  • The 'schema' of the entire table is the full set of columns for all the event types (this is correctly put together by Glue crawler)
  • The 'schema' of each partition is the subset of columns for the event types that occurred on that day (hence in Glue each partition potentially has a different subset of columns from the table schema)
  • This inconsistency causes the error in Athena I think

If I were to manually write a schema I could do this fine as there would just be one table schema, and keys which are missing in the JSON file would be treated as Nulls.

Thanks in advance!


回答1:


I had the same issue, solved it by configuring crawler to update table metadata for preexisting partitions:




回答2:


This helped me. Posting the image for others in case the link is lost




回答3:


It also fixed my issue! If somebody need to provision This Configuration Crawler with Terraform so here is how I did it:

resource "aws_glue_crawler" "crawler-s3-rawdata" {
  database_name = "my_glue_database"
  name          = "my_crawler"
  role          = "my_iam_role.arn"

  configuration = <<EOF
{
   "Version": 1.0,
   "CrawlerOutput": {
      "Partitions": { "AddOrUpdateBehavior": "InheritFromTable" }
   }
}
EOF
  s3_target {
    path = "s3://mybucket"
  }
}


来源:https://stackoverflow.com/questions/46241088/how-to-create-aws-glue-table-where-partitions-have-different-columns-hive-par

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!