parent
648843bca2
commit
68761158b6
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png (Stored with Git LFS)
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/01-prometheus-overview.png (Stored with Git LFS)
Binary file not shown.
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png (Stored with Git LFS)
BIN
content-org/images/upgrade-your-monitoring-setup-with-prometheus/02-prometheus-status-drop-down-menu.png (Stored with Git LFS)
Binary file not shown.
@ -0,0 +1,264 @@ |
||||
+++ |
||||
title = "Upgrade your monitoring setup with Prometheus" |
||||
author = ["Elia el Lazkani"] |
||||
date = 2021-09-17 |
||||
lastmod = 2021-09-17 |
||||
tags = ["prometheus", "metrics", "container"] |
||||
categories = ["monitoring"] |
||||
draft = true |
||||
+++ |
||||
|
||||
After running simple monitoring for quite a while, I decided to upgrade my |
||||
setup. It is about time to get some real metric gathering to see what's going |
||||
on. It's also time to get some proper monitoring setup. |
||||
|
||||
There are a lot of options in this field and I should, probably, write a blog |
||||
post on my views on the topic. For this experiment, on the other hand, the |
||||
solution is already pre-chosen. We'll be running Prometheus. |
||||
|
||||
<!--more--> |
||||
|
||||
|
||||
## Prometheus {#prometheus} |
||||
|
||||
To answer the question, _what is Prometheus?_, we'll rip a page out of the |
||||
Prometheus [docs](https://prometheus.io/docs/introduction/overview/). |
||||
|
||||
> Prometheus is an open-source systems monitoring and alerting toolkit originally |
||||
> built at SoundCloud. Since its inception in 2012, many companies and |
||||
> organizations have adopted Prometheus, and the project has a very active |
||||
> developer and user community. It is now a standalone open source project and |
||||
> maintained independently of any company. To emphasize this, and to clarify the |
||||
> project's governance structure, Prometheus joined the Cloud Native Computing |
||||
> Foundation in 2016 as the second hosted project, after Kubernetes. |
||||
> |
||||
> Prometheus collects and stores its metrics as time series data, i.e. metrics |
||||
> information is stored with the timestamp at which it was recorded, alongside |
||||
> optional key-value pairs called labels. |
||||
|
||||
let's decypher all this jargon down to plain English. In simple terms, |
||||
Prometheus is a system that scrape metrics, from your services and applications, |
||||
and stores those metrics, in a time series database, ready to serve back again |
||||
when queried. |
||||
|
||||
Prometheus also offers a way to create rules on those metrics to alert you when |
||||
something goes wrong. Combined with [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/), you got yourself a full |
||||
monitoring system. |
||||
|
||||
|
||||
## Configuration {#configuration} |
||||
|
||||
Now that we briefly touched on a _few_ features of **Prometheus** and before we |
||||
can deploy, we need to write our configuration. |
||||
|
||||
This is an example of a bare configuration. |
||||
|
||||
<a id="code-snippet--prometheus-scraping-config"></a> |
||||
```yaml |
||||
scrape_configs: |
||||
- job_name: prometheus |
||||
scrape_interval: 30s |
||||
static_configs: |
||||
- targets: |
||||
- prometheus:9090 |
||||
``` |
||||
|
||||
This will make Prometheus scrape itself every 30 seconds for metrics. At least |
||||
you get _some_ metrics to query later. If you want the full experience, I would |
||||
suggest you enable _Prometheus metrics_ for your services. Consult the docs of |
||||
the project to see if and how it can expose metrics for _Prometheus_ to scrape, |
||||
then add the scrape endpoint to your configuration as shown above. |
||||
|
||||
Here's a an example of a couple more, _well known_, projects; [_Alertmanager_](https://prometheus.io/docs/alerting/latest/alertmanager/) and |
||||
[_node exporter_](https://github.com/prometheus/node%5Fexporter). |
||||
|
||||
<a id="code-snippet--prometheus-example-scraping-config"></a> |
||||
```yaml |
||||
- job_name: alertmanager |
||||
scrape_interval: 30s |
||||
static_configs: |
||||
- targets: |
||||
- alertmanager:9093 |
||||
|
||||
- job_name: node-exporter |
||||
scrape_interval: 30s |
||||
static_configs: |
||||
- targets: |
||||
- node-exporter:9100 |
||||
``` |
||||
|
||||
A wider [list of exporters](https://prometheus.io/docs/instrumenting/exporters/) can be found on the Prometheus docs. |
||||
|
||||
|
||||
## Deployment {#deployment} |
||||
|
||||
Now that we got ourselves a cofniguration, let's deploy **Prometheus**. |
||||
|
||||
Luckily for us, Prometheus comes containerized and ready to deploy. We'll be |
||||
using `docker-compose` in this example to make it easier to translate later to |
||||
other types of deployments. |
||||
|
||||
<div class="admonition note"> |
||||
<p class="admonition-title">Note</p> |
||||
|
||||
I'm still running on `2.x` API version. I know I need to upgrade to a newer |
||||
version but that's a bit of networking work. It's an ongoing work. |
||||
|
||||
</div> |
||||
|
||||
The `docker-compose` file should look like the following. |
||||
|
||||
```yaml |
||||
--- |
||||
version: '2.3' |
||||
|
||||
services: |
||||
prometheus: |
||||
image: quay.io/prometheus/prometheus:v2.27.0 |
||||
container_name: prometheus |
||||
mem_limit: 400m |
||||
mem_reservation: 300m |
||||
restart: unless-stopped |
||||
command: |
||||
- --config.file=/etc/prometheus/prometheus.yml |
||||
- --web.external-url=http://prometheus.localhost/ |
||||
volumes: |
||||
- "./prometheus/:/etc/prometheus/:ro" |
||||
ports: |
||||
- "80:9090" |
||||
``` |
||||
|
||||
A few things to **note**, especially for the new container crowd. The container |
||||
image **version** is explicitly specified, do **not** use `latest` in production. |
||||
|
||||
To make sure I don't overload my host, I set memory limits. I don't mind if it |
||||
goes down, this is a PoC (Proof of Concept) for the time being. In your case, |
||||
you might want to choose higher limits to give it more room to breath. When the |
||||
memory limit is reached, the container will be killed with _Out Of Memory_ |
||||
error. |
||||
|
||||
In the **command** section, I specify the _external url_ for Prometheus to |
||||
redirect me correctly. This is what Prometheus thinks its own hostname is. I |
||||
also specify the configuration file, previously written, which I mount as |
||||
_read-only_ in the **volumes** section. |
||||
|
||||
Finally, we need to port-forward `9090` to our hosts' `80` if possible to access |
||||
**Prometheus**. Otherwise, figure out a way to route it properly. This is a local |
||||
installation, which is suggested by the Prometheus _hostname_. |
||||
|
||||
If you made it so far, you should be able to run this with no issues. |
||||
|
||||
```bash |
||||
docker-compose up -d |
||||
``` |
||||
|
||||
|
||||
## Prometheus Rules {#prometheus-rules} |
||||
|
||||
**Prometheus** supports **two** types of rules; recording and alerting. Let's expand |
||||
a little bit on those two concepts. |
||||
|
||||
|
||||
### Recording Rules {#recording-rules} |
||||
|
||||
First, let's start off with [recording rules](https://prometheus.io/docs/prometheus/latest/configuration/recording%5Frules/). I don't think I can explain it |
||||
better than the **Prometheus** documentation which says. |
||||
|
||||
> Recording rules allow you to precompute frequently needed or computationally |
||||
> expensive expressions and save their result as a new set of time series. |
||||
> Querying the precomputed result will then often be much faster than executing |
||||
> the original expression every time it is needed. This is especially useful for |
||||
> dashboards, which need to query the same expression repeatedly every time they |
||||
> refresh. |
||||
|
||||
Sounds pretty simple right ? Well it is. Unfortunately, I haven't needed to |
||||
create recording rules yet for my setup so I'll forgo this step. |
||||
|
||||
|
||||
### Alerting Rules {#alerting-rules} |
||||
|
||||
As the name suggests, [alerting rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting%5Frules/#alerting-rules) allow you to define conditional expressions |
||||
based on metrics which will trigger notifications to alert you. |
||||
|
||||
This is a very simple example of an _alert rule_ that monitors all the endpoints |
||||
scraped by _Prometheus_ to see if any of them is down. If this expression return |
||||
a result, an alert will fire from _Prometheus_. |
||||
|
||||
```yaml |
||||
groups: |
||||
- name: Instance down |
||||
rules: |
||||
- alert: InstanceDown |
||||
expr: up == 0 |
||||
for: 5m |
||||
labels: |
||||
severity: page |
||||
annotations: |
||||
summary: "Instance {{ $labels.instance }} down" |
||||
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes." |
||||
``` |
||||
|
||||
To be able to add this alert to **Prometheus**, we need to save it in a |
||||
`rules.yml` file and then include it in the **Prometheus** configuration as follows. |
||||
|
||||
<a id="code-snippet--prometheus-rule-files-config"></a> |
||||
```yaml |
||||
rule_files: |
||||
- "rules.yml" |
||||
``` |
||||
|
||||
Making the configuration intiretly as follows. |
||||
|
||||
```yaml |
||||
rule_files: |
||||
- "rules.yml" |
||||
|
||||
scrape_configs: |
||||
- job_name: prometheus |
||||
scrape_interval: 30s |
||||
static_configs: |
||||
- targets: |
||||
- prometheus:9090 |
||||
|
||||
- job_name: alertmanager |
||||
scrape_interval: 30s |
||||
static_configs: |
||||
- targets: |
||||
- alertmanager:9093 |
||||
|
||||
- job_name: node-exporter |
||||
scrape_interval: 30s |
||||
static_configs: |
||||
- targets: |
||||
- node-exporter:9100 |
||||
``` |
||||
|
||||
At this point, make sure everything is mounted into the container properly and |
||||
rerun your **Prometheus**. |
||||
|
||||
|
||||
## Prometheus UI {#prometheus-ui} |
||||
|
||||
Congratulations if you've made it so far. If you visit <http://localhost/> at |
||||
stage you should get to Prometheus where you can query your metrics. |
||||
|
||||
{{< figure src="/ox-hugo/01-prometheus-overview.png" caption="Figure 1: Prometheus overview" target="_blank" link="/ox-hugo/01-prometheus-overview.png" >}} |
||||
|
||||
You can get all sorts of information under the _status_ drop-down menu. |
||||
|
||||
{{< figure src="/ox-hugo/02-prometheus-status-drop-down-menu.png" caption="Figure 2: Prometheus Status drop-down menu" target="_blank" link="/ox-hugo/02-prometheus-status-drop-down-menu.png" >}} |
||||
|
||||
|
||||
## Conclusion {#conclusion} |
||||
|
||||
As you can see, deploying **Prometheus** is not too hard. If you're running |
||||
_Kubernetes_, make sure you use the operator. It will make your life a lot |
||||
easier in all sorts of things. |
||||
|
||||
Take your time to familiarise yourself with **Prometheus** and consult the |
||||
documentation as much as possible. It is well written and in most cases your |
||||
best friend. Figure out different ways to create rules for recording and |
||||
alerting. Most people at this stage deploy **Grafana** to start visualizing their |
||||
metrics. Well... Not in this blog post we ain't ! |
||||
|
||||
I hope you enjoy playing around with **Prometheus** and until the next post. |
Binary file not shown.
Binary file not shown.
Loading…
Reference in new issue