I have read that Spark does not have Prometheus as one of the pre-packaged sinks. So I found this post on how to monitor Apache Spark with prometheus.
But I found it
There are few ways to monitoring Apache Spark with Prometheus.
One of the way is by JmxSink + jmx-exporter
In the following command, the jmx_prometheus_javaagent-0.3.1.jar
file and the spark.yml
are downloaded in previous steps. It might need be changed accordingly.
bin/spark-shell --conf "spark.driver.extraJavaOptions=-javaagent:jmx_prometheus_javaagent-0.3.1.jar=8080:spark.yml"
After running, we can access with localhost:8080/metrics
It can then configure prometheus to scrape the metrics from jmx-exporter.
NOTE: We have to handle to discovery part properly if it's running in a cluster environment.