AWS Athena: does `msck repair table` incur costs?

懵懂的女人 提交于 2019-12-10 21:48:02

问题


I have ORC data in S3 that looks like this:

s3://bucket/orc/clientId=client-1/year=2017/month=3/day=16/hour=20/
s3://bucket/orc/clientId=client-2/year=2017/month=3/day=16/hour=21/
s3://bucket/orc/clientId=client-3/year=2017/month=3/day=16/hour=22/

Every hour I run an EMR job that converts raw JSON in S3 to ORC, and write it out with the path partition convention (above) for Athena ingestion. After the EMR job completes, I run msck repair table so Athena can pick up the new partitions.

I have 3 related questions:

  1. Does running msck repair table in this scenario, cost me money in AWS?
  2. AWS Docs say msck repair table can timeout. Is there a way I can make a step in data pipeline to continue running this command until it completes successfully?
  3. I would prefer to add the partitions manually to Athena (since I know the year,month,day,hour I'm working on). However I do not know the clientId because there could be 1-X of them, and I don't know which ones exist at time of running EMR. Is there a best practice way to solve this problem (using Hive or something else)? I could make an s3 api call to get a list of s3://bucket/org/ and write code to iterate over list and add manually. I'm hoping there is an easier way...

Note: when I say "add partitions manually" I mean doing something like this:

ALTER TABLE <athena table> 
ADD PARTITION (clientId='client-1',year=2017,month=3,day=16,hour=20) 
location 's3://bucket/orc/clientId=client-1/year=2017/month=3/day=16/hour=20/';

回答1:


AWS says:

There's no charge for DDL queries or for partition detection.

AWS says:

S3 GET charges do apply.

I do not yet know how to automate msck repair table to make sure it completes.




回答2:


Unfortunately I do not have enough reputation to comment on @rynop's response, but I wanted to add that the Athena API provides a request GetQueryExecution which can be polled to determine the result of any query execution. The response from StartQueryExecution provides the QueryExecutionId.



来源:https://stackoverflow.com/questions/42845002/aws-athena-does-msck-repair-table-incur-costs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!