AWS Athena Returning Zero Records from Tables Created from GLUE Crawler input csv from S3

问题

Part One :

I tried glue crawler to run on dummy csv loaded in s3 it created a table but when I try view table in athena and query it it shows Zero Records returned.

But the demo data of ELB in Athena works fine.

Part Two (Scenario:)

Suppose I Have a excel file and data dictionary of how and what format data is stored in that file , I want that data to be dumped in AWS Redshift What would be best way to achieve this ?

回答1:

I experienced the same issue. You need to give the folder path instead of the real file name to the crawler and run it. I tried with feeding folder name to the crawler and it worked. Hope this helps. Let me know. Thanks,

回答2:

I experienced the same issue. try creating separate folder for single table in s3 buckets than rerun the glue crawler.you will get a new table in glue data catalog which has the same name as s3 bucket folder name .

回答3:

Delete Crawler ones again create Crawler(only one csv file should be not more available in s3 and run the crawler) important note one CSV file run it we can view the records in Athena.

回答4:

I was indeed providing the S3 folder path instead of the filename and still couldn't get Athena to return any records ("Zero records returned", "Data scanned: 0KB").

Turns out the problem was that the input files (my rotated log files automatically uploaded to S3 from Elastic Beanstalk) start with underscore (_), e.g. _var_log_nginx_rotated_access.log1534237261.gz! Apparently that's not allowed.

来源：https://stackoverflow.com/questions/47266924/aws-athena-returning-zero-records-from-tables-created-from-glue-crawler-input-cs

标签

amazon-web-services

csv

amazon-redshift

amazon-athena

aws-glue