问题
Recently we started to store our backups in aws s3. It is all csv files that we need to query through aws athena. We tried to insert the tables one by one but it's taking too long, it is a fair amount of data. Is there any API that we can use or something that is alredy set? we were about to do something with spark, but maybe there is a simpler way, or something that's already have been done. thanks
回答1:
You can simply create an external table on top of CSV files with the required properties.
Reference : Create External Table on AWS Athena
You can also use Glue Crawler and configure it to automatically populate the tables for you.
Reference : Cataloging tables with a crawler
There are different AWS SDK's available (here) to automate your tasks like uploading files to S3, creating athena tables or cataloging tables through glue clawler.
来源:https://stackoverflow.com/questions/52041500/query-csv-tables-stored-s3-through-athena