AWS Glue Crawler Classifies json file as UNKNOWN

允我心安 提交于 2019-11-30 21:21:33

I have two json files which are 42mb and 16mb, partitioned on S3 as path:

  • s3://bucket/stg/year/month/_0.json

  • s3://bucket/stg/year/month/_1.json

I had the same problem as you, crawler classification as UNKNOWN.

I were able to solved it:

  • You must create custom classifier with jsonPath as "$[*]" then create new crawler with the classifier.
  • Run your new crawler with the data on S3 and proper schema will be created.
  • DO NOT update your current crawler with the classifier as it won't apply the change, I don't know why, maybe because of classifier versioning AWS mentioned in their documents. Create new crawler make them work

As mentioned in

https://docs.aws.amazon.com/glue/latest/dg/custom-classifier.html#custom-classifier-json

When you run a crawler using the built-in JSON classifier, the entire file is used to define the schema. Because you don’t specify a JSON path, the crawler treats the data as one object, that is, just an array.

That is something which Dung also pointed out in his answer.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!