This tool implements a deadman switch for software systems. It expects to regularly receive pings from its configured services and one or more webhooks will be called when it doesn't hear from a service for a configurable amount of time.
- alert you when your services are down
- alert you when your services up again
- notifications can be send to any webhook or to slack
- use custom URL, headers, body for webhooks
- use custom key/value pairs on the slack message
- configurable message debouncing
- dynamic configuration of services and notifications via HTTP API
- secured with basic auth
- scalable in both directions
- from a small container with <32MB RAM
- to a cluster that can handle thousands of pings and notifications per second
- leader election in the cluster, so only one node checks deadlines and triggers notifications
- notifications are queued, so they can be executed by the whole cluster
- optionally supply a secret token when configuring your services, so the ping messages can't be spoofed easily
Up and running in less than 1 minute:
# start deadman-switch
docker run --name deadman-switch -d --rm -p 8080:8080 trusch/deadman-switch:latest
# configure service
curl -u admin:admin -XPOST --data-binary @- localhost:8080/config <<EOF
{
"id": "service-1",
"timeout": "30s",
"debounce": "1m",
"alertNotifications": [
{
"type": "webhook",
"config": {
"method": "GET",
"url": "http://localhost:8080/log?service-1-alert"
}
}
]
}
EOF
# call the ping endpoint
curl http://localhost:8080/ping/service-1
# look at the logs
docker logs -f deadman-switch
This repo requires podman
and buildah
as development toolset.
Ubuntu install commands:
. /etc/os-release
echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/ /" | sudo tee /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/xUbuntu_${VERSION_ID}/Release.key | sudo apt-key add -
sudo apt-get update
sudo apt-get -y upgrade
sudo apt-get -y install podman buildah skopeo
echo "${USER}:100000:65535" | sudo tee -a /etc/subuid
echo "${USER}:100000:65535" | sudo tee -a /etc/subgid
make image
make run
This will bring up a pod with etcd
as storage backend, caddy
as ingress router and two instances of deadman-switch
. The pod will expose port 8080 to serve our HTTP API.
You can now for example list all configured services like this:
curl -u admin:admin http://localhost:8080/config | jq .
You can also POST or DELETE service config objects using this endpoint:
curl -XPOST -u admin:admin -d '{"id":"new_service", "timeout":"10s", "notifications":[{"webhook": {"url": "https://google.com", "method": "GET"}}]}' http://localhost:8080/config
curl -XDELETE -u admin:admin http://localhost:8080/config/new_service
To actually send a ping to the deadman switch do something like this:
curl http://localhost:8080/ping/svc1?token=secret1
If you don't do anything, the application will start calling its configured webhooks after 30 seconds. You can see that in the logs: podman logs -fn deadman-switch-1 deadman-switch-2
.
Please note that only one of the two nodes checks the deadlines, but both nodes are used to send out the actual notification webhooks.