Use cloudwatch to determine if linux service is running

試著忘記壹切 提交于 2019-12-04 07:59:47
BestPractices

You can publish a custom metric to CloudWatch in the form of a "heart beat".

  • Have a small script running via cron on your server checking the process list to see whether my_service is running and if it is, make a put-metric-data call to CloudWatch.
  • The metric could be as simple as pushing the number "1" to your custom metric in CloudWatch.
  • Set up a CloudWatch alarm that triggers if the average for the metric falls below 1
  • Make the period of the alarm be >= the period that the cron runs e.g. cron runs every 5 minutes, make the alarm alarm if it sees the average is below 1 for two 5 minute periods.
  • Make sure you also handle the situation in which the metric is not published (e. g. cron fails to run or whole machine dies). you would want to setup an alert in case the metric is missing. (see here: AWS Cloudwatch Heartbeat Alarm)
  • Be aware that the custom metric will add an additional cost of 50c to your AWS bill (not a big deal for one metric - but the equation changes drastically if you want to push hundred/thousands of metrics - i.e. good to know it's not free as one would expect)

See here for how to publish a custom metric: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html

I am not sure if CloudWatch is the right route for checking if the service is running - it would be easier with Nagios kind of solution.

Nevertheless, you may try the CloudWatch Custom metrics approach. You add Additional lines of code which publishes say an integer 1 to CloudWatch Custom Metrics every 5 mins. Your can then configure CloudWatch alarms to do a SNS Notification / Mail Notification for the conditions like Sample Count or sum deviating your anticipated value.

script
    exec my_exec
    publish cloudwatch custom metrics value
end script

More Info

Publish Custom Metrics - http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!