--- author: mikeconrad categories: - Ansible - Automation - Docker - Software Engineering - Traefik date: "2024-05-11T09:44:01Z" tags: - Blog Post title: Traefik 3.0 service discovery in Docker Swarm mode --- I recently decided to set up a Docker swarm cluster for a project I was working on. If you aren’t familiar with Swarm mode, it is similar in some ways to k8s but with much less complexity and it is built into Docker. If you are looking for a fairly straightforward way to deploy containers across a number of nodes without all the overhead of k8s it can be a good choice, however it isn’t a very popular or widespread solution these days. Anyway, I set up a VM scaling set in Azure with 10 Ubuntu 22.04 vms and wrote some Ansible scripts to automate the process of installing Docker on each machine as well as setting 3 up as swarm managers and the other 7 as worker nodes. I ssh’d into the primary manager node and created a docker compose file for launching an observability stack. Here is what that `docker-compose.yml` looks like: ``` --- services: otel-collector: image: otel/opentelemetry-collector-contrib:0.88.0 volumes: - /home/user/repo/common/devops/observability/otel-config.yaml:/etc/otel/config.yaml - /home/user/repo/log:/log/otel command: --config /etc/otel/config.yaml environment: JAEGER_ENDPOINT: 'tempo:4317' LOKI_ENDPOINT: 'http://loki:3100/loki/api/v1/push' ports: - '8889:8889' # Prometheus metrics exporter (scrape endpoint) - '13133:13133' # health_check extension - '55679:55679' # ZPages extension deploy: placement: constraints: - node.hostname==dockerswa2V8BY4 networks: - traefik prometheus: container_name: prometheus image: prom/prometheus:v2.42.0 volumes: - /home/user/repo/common/devops/observability/prometheus.yml:/etc/prometheus/prometheus.yml ports: - '9090:9090' deploy: placement: constraints: - node.hostname==dockerswa2V8BY4 networks: - traefik loki: container_name: loki image: grafana/loki:2.7.4 ports: - '3100:3100' networks: - traefik grafana: container_name: grafana image: grafana/grafana:9.4.3 volumes: - /home/user/repo/common/devops/observability/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml environment: GF_AUTH_ANONYMOUS_ENABLED: 'false' GF_AUTH_ANONYMOUS_ORG_ROLE: 'Admin' expose: - '3000' labels: - traefik.constraint-label=traefik - traefik.http.middlewares.https-redirect.redirectscheme.scheme=https - traefik.http.middlewares.https-redirect.redirectscheme.permanent=true - traefik.http.routers.grafana-http.rule=Host(`swarm-grafana.mydomain.com`) - traefik.http.routers.grafana-http.entrypoints=http - traefik.http.routers.grafana-http.middlewares=https-redirect # traefik-https the actual router using HTTPS # Uses the environment variable DOMAIN - traefik.http.routers.grafana-https.rule=Host(`swarm-grafana.mydomain.com`) - traefik.http.routers.grafana-https.entrypoints=https - traefik.http.routers.grafana-https.tls=true # Use the special Traefik service api@internal with the web UI/Dashboard - traefik.http.routers.grafana-https.service=grafana # Use the "le" (Let's Encrypt) resolver created below - traefik.http.routers.grafana-https.tls.certresolver=le # Enable HTTP Basic auth, using the middleware created above - traefik.http.services.grafana.loadbalancer.server.port=3000 deploy: placement: constraints: - node.hostname==dockerswa2V8BY4 networks: - traefik # Tempo runs as user 10001, and docker compose creates the volume as root. # As such, we need to chown the volume in order for Tempo to start correctly. init: image: &tempoImage grafana/tempo:latest user: root entrypoint: - 'chown' - '10001:10001' - '/var/tempo' volumes: - /home/user/repo/tempo-data:/var/tempo deploy: placement: constraints: - node.hostname==dockerswa2V8BY4 tempo: image: *tempoImage container_name: tempo command: ['-config.file=/etc/tempo.yaml'] volumes: - /home/user/repo/common/devops/observability/tempo.yaml:/etc/tempo.yaml - /home/user/repo/tempo-data:/var/tempo deploy: placement: constraints: - node.hostname==dockerswa2V8BY4 ports: - '14268' # jaeger ingest - '3200' # tempo - '4317' # otlp grpc - '4318' # otlp http - '9411' # zipkin depends_on: - init networks: - traefik networks: traefik: external: true ``` Pretty straightforward so I proceed to deploy it into the swarm ``` docker stack deploy -c docker-compose.yml observability ``` Everything deploys properly but when I view the Traefik logs there is an issue with all the services except for the grafana service. I get errors like this: ``` traefik_traefik.1.tm5iqb9x59on@dockerswa2V8BY4 | 2024-05-11T13:14:16Z ERR error="service \"observability-prometheus\" error: port is missing" container=observability-prometheus-37i852h4o36c23lzwuu9pvee9 providerName=swarm ``` It drove me crazy for about half a day or so. I couldn’t find any reason why the grafana service worked as expected but none of the others did. Part of my love/hate relationship with Traefik stems from the fact that configuration issues like this can be hard to track and debug. Ultimately after lots of searching and banging my head against a wall I found the answer in the Traefik docs and thought I would share here for anyone else who might run into this issue. Again, this solution is specific to Docker Swarm mode. Expand that first section and you will see the solution:
![](https://hackanooga.com/wp-content/uploads/2024/05/image.png)
It turns out I just needed to update my `docker-compose.yml` and nest the labels under a deploy section, redeploy and everything was working as expected.