AWS crawler could not classify the file type stores in S3 if its size >1MB

谁都会走 提交于 2019-12-24 09:12:54

问题


When iam trying to detect the file type using Crawler of size >=1MB of input Json file It creates a table in glue with is classification type is "Unknown". But when the size is <1MB it successfully classifies the file type as JSON.

I crosschecked the file to ensure its a valid json file.

It is something a limitation for aws crawler.

If so is there any alternative to this issue.


回答1:


Yes, that is by design of the crawler, if the meta data ( Internally crawler creates it) exceeds 1mb you'll get the above error, Crawler crawls 1mb for files that are more than 1mb or the entire file if the file size is less than 1Mb. If the metadata itself doesn't fit 1Mb then the file will end up in Unkowntype.



来源:https://stackoverflow.com/questions/50954330/aws-crawler-could-not-classify-the-file-type-stores-in-s3-if-its-size-1mb

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!