How to monitor the size of a directory via Telegraf

心不动则不痛 提交于 2020-12-28 07:53:58

问题


We need to monitor the size of a directory (for example the data directory of InfluxDB) to set up alerts in Grafana. As mentioned here: How to configure telegraf to send a folder-size to influxDB , there is no built-in plugin for this.

We don't mind using the inputs.exec section of Telegraf. The directories are not huge (low filecount + dircount), so deep scanning (like the use of du) is fine by us.

One of the directories we need to monitor is /var/lib/influxdb/data.

What would be a simple script to execute, and what are the caveats?


回答1:


You could create a simple bash script metrics-exec_du.sh with the following content (chmod 755):

#!/usr/bin/env bash
du -bs "${1}" | awk '{print "[ { \"bytes\": "$1", \"dudir\": \""$2"\" } ]";}'

And activate it by putting the following in the Telegraf config file:

[[inputs.exec]] commands = [ "YOUR_PATH/metrics-exec_du.sh /var/lib/influxdb/data" ] timeout = "5s" name_override = "du" name_suffix = "" data_format = "json" tag_keys = [ "dudir" ]

Caveats:

  1. The du command can stress your server, so use with care
  2. The user telegraf must be able to scan the dirs. There are several options, but since InfluxDB's directory mask is a bit unspecified (see: https://github.com/influxdata/influxdb/issues/5171#issuecomment-306419800), we applied a rather crude workaround (examples are for Ubuntu 16.04.2 LTS):
    • Add the influxdb group to the user telegraf : sudo usermod --groups influxdb --append telegraf
    • Put the following in the crontab, run for example each 10 minutes: 10 * * * * chmod -R g+rX /var/lib/influxdb/data > /var/log/influxdb/chmodfix.log 2>&1

Result, configured in Grafana (data source: InfluxDB):

Cheers, TW




回答2:


If you need to monitor multiple directories I updated the answer by Tw Bert and extended it to allow you to pass them all on one command line. This saves you having to add multiple [[input.exec]] entries into your telegraf.conf file.

Create the file /etc/telegraf/scripts/disk-usage.sh containing:

#!/bin/bash

echo "["
du -ks "$@" | awk '{if (NR!=1) {printf ",\n"};printf "  { \"directory_size_kilobytes\": "$1", \"path\": \""$2"\" }";}'
echo
echo "]"

I want to monitor two directories: /mnt/user/appdata/influxdb and /mnt/user/appdata/grafana. I can do something like this:

# Get disk usage for multiple directories
[[inputs.exec]]
  commands = [ "/etc/telegraf/scripts/disk-usage.sh /mnt/user/appdata/influxdb /mnt/user/appdata/grafana" ]
  timeout = "5s"
  name_override = "du"
  name_suffix = ""
  data_format = "json"
  tag_keys = [ "path" ]

Once you've updated your config, you can test this with:

telegraf --debug --config /etc/telegraf/telegraf.conf --input-filter exec --test

Which should show you what Telegraf will push to influx:

bash-4.3# telegraf --debug --config /etc/telegraf/telegraf.conf --input-filter exec --test
> du,host=SomeHost,path=/mnt/user/appdata/influxdb directory_size_kilobytes=80928 1536297559000000000
> du,host=SomeHost,path=/mnt/user/appdata/grafana directory_size_kilobytes=596 1536297559000000000



回答3:


The solutions already provided look good to me and highlighting the caveats such a read permission is great. An alternative worth mentioning is Using Telegraf to collect the data as proposed in monitor diskspace on influxdb with telegraf.

[[outputs.influxdb]]
  urls = ["udp://your_host:8089"]
  database = "telegraf_metrics"

  ## Retention policy to write to. Empty string writes to the default rp.
  retention_policy = ""
  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
  write_consistency = "any"

  ## Write timeout (for the InfluxDB client), formatted as a string.
  ## If not provided, will default to 5s. 0s means no timeout (not recommended).
  timeout = "5s" 

# Read metrics about disk usage by mount point
[[inputs.disk]]
  ## By default, telegraf gather stats for all mountpoints.
  ## Setting mountpoints will restrict the stats to the specified mountpoints.
  # mount_points = ["/"]

  ## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
  ## present on /run, /var/run, /dev/shm or /dev).
  ignore_fs = ["tmpfs", "devtmpfs"]

Note: the timeout should be considered carefully. Maybe hourly readings would be sufficient to avoid exhaustion by logging.



来源:https://stackoverflow.com/questions/44386205/how-to-monitor-the-size-of-a-directory-via-telegraf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!