How to execute scheduled SQL script on Amazon Redshift?

混江龙づ霸主 提交于 2019-11-27 07:06:45

问题


I have series of ~10 queries to be executed every hour automatically in Redshift (maybe report success/failure).

Most queries are aggregation on my tables.

I have tried using AWS Lambda with CloudWatch Events, but Lambda functions only survive for 5 minutes max and my queries can take up to 25 minutes.


回答1:


It's kind of strange that AWS doesn't provide a simple distributed cron style service. It would be useful for so many things. There is SWF, but the timing/scheduling aspect is left up to the user. You could use Lambda/Cloudwatch to trigger SWF events. That's a lot of overhead to get reasonable cron like activity.

Like the comment says the easiest way would be to run a small instance and host cron jobs there. Use an autoscale group of 1 for some reliability. A similar but more complicated approach is to use elastic beanstalk.

If you really want redundancy, reliability, visibility, etc. it might be worth looking at a third party solution like Airflow. There are many others depending on your language of preference.

Here's a similar question with more info.




回答2:


i had the same problem in the past,

you can use R or Python for that.

i used R , you can install package RpostgreSQL and connecting to your Redshift attached example:

drv <- dbDriver("PostgreSQL")
conn <-dbConnect(drv,host='mm-stats-1.ctea4hmr4vlw.us-east-1.redshift.amazonaws.com',port='5439',dbname='stats',user='xxx',password='yyy')

and then you can build report with markdown and then scheduled it with crontab task.

also i used mailR package to send the report to other users




回答3:


use aws lambda to run your script. you can schedule it. see https://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html

this uses CloudWatch events behind the scenes. If you do it from the console, it will set things up for you.




回答4:


You can use Data Pipeline to do that, although I think it's on an end-of-life path since they haven't released any new features to the service in a while and the GUI is pretty archaic and difficult to work with. The main benefit of using Data Pipeline over Lambda is that Lambda functions can only run for a maximum of 15 minutes, whereas Data Pipeline can track the status of the query until it's complete.



来源:https://stackoverflow.com/questions/42564910/how-to-execute-scheduled-sql-script-on-amazon-redshift

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!