Just a small post about setting up Prometheus and Grafana with Traefik and Watchtower. There were some complications using IPv6 which can be easily circumvented.

I’m setting these services up under Debian Stable, which is at the time of this writing Bullseye (Debian 11). This doesn’t ship the latest versions, but it’s stable.

Unlike my previous post Monitoring with grafana and prometheus, this time I’m using docker and docker compose to setup the services, except for prometheus.

Overview

Prometheus provides the metrics for the current server. It consists of a server component and multiple scrape agents, that provide the actual metrics. The Prometheus server is later queried by Grafana, which draws nice diagrams. Prometheus runs natively on the server. It is available in the default Debian repository. And there are also a couple of agents available, e.g. node-exporter.

Grafana provides an easy to use UI for the Prometheus metrics. It will run as a docker container.

Traefik is a reverse proxy, that sits in front of Grafana and does the TLS stuff. It supports Let’s encrypt certificates, so the Grafana instance is properly secured behind HTTPS with a valid certificate. Traefik also runs as a docker container.

Watchtower is one more docker container. It regularly (once a day) checks, whether there are new versions available for the running container. And if so, it will update the containers and restart them. Fully automatically.

Prometheus

Since I’m also running Bind DNS server and Postfix on the machine, I’m also installing these agents in addition to the node exporter:

$ sudo apt install prometheus prometheus-node-exporter prometheus-postfix-exporter prometheus-bind-exporter

After these packages are installed, you can check, that these services are running with sudo service prometheus* status. You can also look at the now open sockets:

$ sudo netstat -tuanp|grep prometheus
tcp6       0      0 :::9100                 :::*                    LISTEN      16084/prometheus-no 
tcp6       0      0 :::9119                 :::*                    LISTEN      15974/prometheus-bi 
tcp6       0      0 :::9090                 :::*                    LISTEN      16538/prometheus    
tcp6       0      0 :::9154                 :::*                    LISTEN      16172/prometheus-po 

All prometheus agents provide an HTTP endpoint. You can check, whether there are metrics available with curl:

$ sudo apt install curl
$ curl http://localhost:9119/metrics # bind exporter
$ curl http://localhost:9154/metrics # Postfix exporter

By default, Debian only configured the node exporter automatically. The metrics from the other agents won’t appear in the main Prometheus instance on port 9090.

Therefore you need to configure prometheus and add additional scrape configs:

$ sudo nano /etc/prometheus/prometheus.yml
...
  - job_name: bind
    static_configs:
      - targets: ['localhost:9119']

  - job_name: postfix
    static_configs:
      - targets: ['localhost:9154']

And restart prometheus:

$ sudo service prometheus stop
$ sudo service prometheus start

For BIND, there is one more configuration needed: The BIND server needs to provide a statistics interface that the agent can query to get the data. Therefore modify the BIND configuration:

$ sudo nano /etc/bind/named.conf.local
...
# https://bind9.readthedocs.io/en/latest/reference.html#statistics-channels-block-definition-and-usage
statistics-channels {
  inet 127.0.0.1 port 8053 allow { 127.0.0.1; };
};

The statement “statistics-channels” is a top-level statement and must not be inside any other declaration (e.g. not within options).

After restarting BIND, you should see that it listens on port 8053 now (and the prometheus-bind-exporter is connected to it):

$ sudo service named restart
$ sudo netstat -tuanp|grep 8053
tcp        0      0 127.0.0.1:8053          0.0.0.0:*               LISTEN      32079/named         
tcp        0      0 127.0.0.1:8053          127.0.0.1:38984         ESTABLISHED 32079/named         
tcp        0      0 127.0.0.1:38984         127.0.0.1:8053          ESTABLISHED 15974/prometheus-bi 

That’s it for prometheus. Note, that the prometheus services bind to any local interface, so these endpoints are accessible from anywhere. Make sure, you have a firewall setup.

Docker and Docker Compose

First thing is to install docker. It is also available in the standard Debian package repository.

$ sudo apt install docker.io docker-compose

So that our user can run docker containers, he needs to be in the new group docker. My user is called admin:

$ sudo addgroup admin docker

Make sure to relogin, so that you are in group docker. Verify it with id, it should look like:

$ id
uid=1000(admin) gid=1000(admin) groups=1000(admin),117(docker)

Verify that the docker network has been created. We’ll need that later on:

$ ip addr show dev docker0
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:10:b7:86:b0 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever

Docker compose in Debian Bullseye is the old version, which is a standalone script. So, all the commands are with a dash, e.g. docker-compose up. Let’s check the version, we have:

$ docker-compose --version
docker-compose version 1.25.0, build unknown

If you have 1.27+, then you can skip the next paragraph. If not, and your service is available both via IPv6 and IPv4, you’ll need to create a IPv6 docker network manually. In theory, it should be as easy as setting the ipv6 option in the docker-compose file, but unfortunately there is a bug. It has been fixed with 25d773c9. The bug seems to be just, that the ipv6 option was not allowed yet. Anyway.

When creating a docker network, you want to use private IP addresses. IPv6 also has such a concept, and you can choose freely in the block “fd00::/8”. See Unique local address for details. Using a private network space, that is not routed, makes sure, that the docker container traffic is not accidentally leaving the machine somehow.

I simply chose “fd00:0:0:1::/64” as the network. Now create a network with ipv6 support:

$ docker network create --subnet="fd00:0:0:1::/64" --gateway="fd00:0:0:1::1" --ipv6 grafana_ipv6

You could maybe leave out specifying subnet and gateway, but then possible network ranges need to be configured on the docker host. And for me, this was not the case.

Docker Compose with grafana+traefik+watchtower

There are lots of tutorial in the internets about this. So I’m sharing my whole docker-compose.yml file now and explain my specific changes.

Make sure to create the docker-compose.yml file inside a folder called “grafana”. This will serve as the namespace for all resources docker-compose is creating. E.g. the full path of my yaml file is: /home/admin/grafana/docker-compose.yml.

version: "3"

networks:
  traefik:
    external:
      name: grafana_ipv6

services:
  grafana:
    restart: always
    labels:
      # Explicitly tell Traefik to expose this container
      - "traefik.enable=true"
      # SSL endpoint
      - "traefik.http.routers.grafana.entryPoints=port443"
      - "traefik.http.routers.grafana.rule=host(`grafana.example.org`)"
      - "traefik.http.routers.grafana.service=grafana"
      - "traefik.http.routers.grafana.tls=true"
      - "traefik.http.routers.grafana.tls.certResolver=le-ssl"
      - "traefik.http.services.grafana.loadBalancer.server.port=3000"
    image: grafana/grafana:latest # https://hub.docker.com/r/grafana/grafana
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SERVER_ROOT_URL=https://grafana.example.org
      - GF_SERVER_DOMAIN=grafana.example.org
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=CHANGE_ME
      - GF_SMTP_ENABLED=true
      - GF_SMTP_HOST=host.docker.internal:25
      - GF_SMTP_FROM_ADDRESS=grafana@grafana.example.org
    networks:
      - traefik

  traefik:
    restart: always
    image: traefik:v2.9 # https://hub.docker.com/_/traefik
    ports:
      - "443:443"
      # expose port below only if you need access to the Traefik API
      # needs - "--api.insecure=true"
      #- "8080:8080"
    command:
      #- "--log.level=DEBUG"
      #- "--api=true"
      # Enabling docker provider
      - "--providers.docker=true"
      # Do not expose containers unless explicitly told so
      - "--providers.docker.exposedbydefault=false"

      - "--entryPoints.port443.address=:443"

      - "--certificatesResolvers.le-ssl.acme.tlsChallenge=true" # use TLS-ALPN-01
      - "--certificatesResolvers.le-ssl.acme.email=admin@grafana.example.org"
      - "--certificatesResolvers.le-ssl.acme.storage=/letsencrypt/acme.json"
    volumes:
      - traefik-data:/letsencrypt/
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - traefik

  watchtower:
    restart: always
    image: containrrr/watchtower:latest # https://hub.docker.com/r/containrrr/watchtower
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    environment:
      WATCHTOWER_MONITOR_ONLY: 'false'
      WATCHTOWER_NOTIFICATIONS: email
      WATCHTOWER_NOTIFICATION_EMAIL_FROM: admin@grafana.example.org
      WATCHTOWER_NOTIFICATION_EMAIL_TO: admin@grafana.example.org
      WATCHTOWER_NOTIFICATION_EMAIL_SERVER: host.docker.internal
      WATCHTOWER_NOTIFICATION_EMAIL_SERVER_PORT: 25
      WATCHTOWER_NOTIFICATION_EMAIL_SERVER_TLS_SKIP_VERIFY: 'true'
    networks:
      - traefik

volumes:
  traefik-data:
  grafana-data:

Let’s start with the network. It is called here in the docker-compose traefik. It is marked as “external”, so that docker-compose doesn’t try to create it, but just reuses it. It reuses the already existing network with the given name. If this network doesn’t exist, docker-compose will fail. Using the network created earlier allows us to have ipv6.

The first service is “grafana”. The option “restart: always” makes sure, that docker will start this container when the system is started. Then a couple of labels are defined. These labels configure traefik: Requests on endpoint port443 for the host “grafana.example.org” are forwarded to the traefik-service “grafana” on this docker container on port 3000. The endpoint uses TLS and uses Let’s encrypt (le-ssl). I’m adding a extra_host entry: This is used to let grafana access prometheus which is running on the docker host. I’m also using this to access the postfix mail server which is also running directly on the docker host. Note that you need to allow relaying emails from the internal docker network, if you want to have outgoing mails for real. Grafana is configured with environment variables. I configured the outgoing mail server. Note that this docker service doesn’t expose any ports. The access to grafana is managed by traefik.

The next service is “traefik”. It only exposes the port 443. It’s important to use port 443, the standard HTTPS port, as Let’s encrypt will send a request to exactly this port to verify ownership of the domain. A non-standard port won’t work with “TLS-ALPN-01”.

The last service is “watchtower”. I also added the docker host as extra_host entry, so that watchtower can send email notifications about updated containers. You could set the monitor only parameter to true, if you first want to watch, whether watchtower works. I’ve set it to false, so that the containers are restarted, when there are newer versions available. Keeping this variable here though makes it easier, if you want to temporarily disable this automatism. You simply change this parameter and apply the changes with docker-compose.

All services share the same network.

Now, before starting the containers, we should allow access to port 443 from the outside world. I use ufw as the firewall, since it’s easy to manage.

$ sudo ufw allow https

Now it’s time to start everything. This is as simple as:

$ docker-compose up -d

The “-d” flag means detach, so that the containers are launched in the background. Use docker-compose logs to have a look at the logs. If everything goes well, you should be able to access “https://grafana.example.org” in your browser using the admin password provided in the docker-compose file. Traefik already requested a new certificate for the domain, so everything should be setup.

Whenever you change something in the docker-compose.yml file, you can simply run again docker-compose up -d, and docker-compose will figure out which services need to be restarted.

Grafana

Now we need to connect Prometheus and Grafana. Prometheus runs on the docker host, Grafana runs as a docker container. If you are using a firewall, you need to explicitly allow traffic from grafana to prometheus. First figure out the ip address of grafana:

$ docker network inspect grafana_ipv6
...
            "4d6fef59889dfbb28ea4c169649a092a36e5b05f15406177a5998e92b7f240b0": {
                "Name": "grafana_grafana_1",
                "EndpointID": "42425a280560ecc9525fe10d80d6ad687ef25542ff5c32070bfb030aafc3513f",
                "MacAddress": "02:42:ac:16:00:02",
                "IPv4Address": "172.22.0.2/16",
                "IPv6Address": "fd00:0:0:1::2/64"
            }

For me, grafana uses 172.22.0.2/16. This will be the source address.

Now we need the IP address of the docker host, seen from inside. This is the ip address of the docker0 interface:

$ ip addr show dev docker0
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:10:b7:86:b0 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever

So, it is 172.17.0.1/16. Now we have all information to add a new firewall rule:

$ sudo ufw allow to 172.17.0.1 port 9090 from 172.22.0.0/16

Note that I used the whole network space 172.22.0.0./16 as the source, not just the grafana IP. In case, the ip addresses change after a restart, then this won’t cause a problem.

Now, login to grafana at https://grafana.example.org as admin. Go to “Configuration > Add Data Source > Prometheus” and enter as Name “grafana.example.org” and as URL http://host.docker.internal:9090. Click “Save & Test”. That’s all.

Then go to “Dashboards > Import dashboard”. Import the dashboards from

and choose your prometheus data source.

Then you should see beautiful diagrams.

If you’ve setup email, you can configure alerting in grafana as well: Go to “Alerting > Contact points”, select “grafana-default-email” and edit & then test. This will send a sample alert email. Of course, you need to configure rules, when you want to be alerted. E.g. when the disk space is running out. With the node-exporter, you have plenty of metrics on which you can react.

Backup

Last thing: Grafana stores the local dashboards and some other stuff in the docker named volume “grafana-data”. Traefik stores the credentials for Let’s encrypt and probably the certificates and private keys as well in “traefik-data”. If you’d need to move the containers (that means, recreate them) to some other host, you can use the following simple backup script to create the backup:

#!/bin/bash
DATE=$(date +%Y-%m-%d)
docker-compose stop
docker run --rm \
  --volume grafana_grafana-data:/vol/grafana \
  --volume grafana_traefik-data:/vol/traefik \
  --volume $(pwd):/backup \
  ubuntu \
  tar cvf /backup/backup-grafana-$DATE.tar /vol
docker-compose start

The idea is, that you start a new container mounting the volumes you want to backup. I first shutdown the containers, so that everything is persisted and the backup is in a consistent state.

When restoring, the same technique can be used: Start a new ubuntu container, mount the volumes and the folder with the backup tar archives and extract the backup.

References