Install Prometheus with Alertmanager & Using Exporters in Istio

Setting up metrics with alerts, then installing a metrics sidecar exporter that works in Istio's service mesh.


Introduction

This is a full tutorial on setting up Prometheus with Alertmanager on Kubernetes along with the alerts I use. Alerts will be hooked up with Slack (as I never look at emails, lol). The focus will be on getting useful alerts, as opposed to using metrics for determining resource usage. Finally, I found setting up exporters as sidecars in pods using Istio's service mesh a bit of a challenge, so I'll include an explanation of how to get that going at the end.

What This Tutorial Covers

What This Tutorial Covers
  1. Installing Prometheus with Alertmanager
  2. Explanation of the Alerts
  3. Setting Up Nginx Exporter as a Sidecar in Istio's Service Mesh

What You Need For This Tutorial

What You Need For This Tutorial

A Kubernetes Cluster


Install Prometheus with Alertmanager

The easiest way to install Prometheus is with helm. First create the following values.yml file. You'll notice the alerts I use are configured in this file.

Now you can install with the following command:

Dead simple right?

Alerts

Okay, same deal as my other tutorial, to keep things simple, let's just access Prometheus using port forwarding:

Click on the "Alerts" tab. Alerts come in 3 states: "Inactive", "Pending", "Firing". The first and third I feel are self explanatory. "Pending" however, signifies that your alerting rule is being triggered, but it hasn't been triggered for the duration of time in the rule. Once it reaches that threshold, for example 5 minutes, then it will move to "Firing" and you'll receive an alert. Once you fix the problem, you should receive another alert that the issue has been resolved.

The alerts I use are separated into 3 groups. The first are "Instances", which are just there to tell me when my pods are crashing or down. The second are "Resources", which tell me when my cluster needs more CPU, memory, or disk space. The third are "Network", which tells me that there is a suspicious amount of network traffic going on. Those 3 kinds of metrics/alerts have been sufficient for me to respond to issues quickly.

My other big use case for metrics is determining how many resource requests/limits to give a pod, as well as testing it under load. I use the metrics provided by Istio's set up for that (see my other tutorial) as well as specific exporters where needed (like the exporter for MongoDB). Basically, I load test and sometimes throw in some chaos engineering (where you deliberately send slow/failing traffic at your pods) to see how what sort of resources they require.

Nginx Exporter in Istio

When we installed Prometheus, it defaulted to coming with Node-Exporter (which gathers metrics on your nodes, like cpu/memory usage) and Kube-State-Metrics (which gathers metrics on kubernetes resources). There are tons of other exporters we can use though to gather metrics on specific types of apps. A simple example is the Nginx-Exporter which obviously gathers Nginx specific metrics.

This is where the concept of a pod is useful. Our pod will have 3 containers which should always be deployed together. The 1st is an Nginx server for a website, the 2nd is the Istio proxy for handling traffic, and the 3rd is Nginx-Exporter for grabbing metrics from the 1st container and sending them off to Prometheus.

The Istio proxy is great for handling complicated traffic like canary deployments to your server, but it's not necessary for our Nginx metrics, but because the proxy intercepts all traffic to/from the pod, if there isn't a Virtual Service defined for the exporter, traffic from Prometheus will be unable to communicate with the exporter. To bypass this, we need to use some annotations. They were difficult to find (they're not in the docs and I had to run around the source code looking for them).

By adding those annotation, the Istio proxy should no longer try to intercept traffic to/from the exporter and thus Prometheus should be able to communicate to it just fine within the cluster. With the Prometheus scape annotations but without the traffic annotations, Prometheus would have been unable to communicate with the exporter and InstanceDown alert would be firing for it.

One last note, you should notice that our server exposed 2 ports. 80 is for regular traffic and the 8080 port is used to provide Nginx's /stub_status, which is what Nginx-Exporter uses for metrics. In other words, your Nginx server provides metrics at 8080/metrics, Nginx-Exporter gathers those metrics and formats them for Prometheus, and finally Prometheus scrapes those metrics periodically from Nginx-Exporter. The file below, shows a basic Nginx configuration file for offering up regular traffic at 80 and stub_status at 8080.

Completos!

So that's Kubernetes metrics using Prometheus with Alertmanager. If you have alerting rules that you think are useful, please share them! I think it's surprising that there are millions of these metrics out there, but I rarely see people sharing the alerts they use. Thanks for reading!