nodeos-monitor
provides failover for EOS nodes and block producers
using Etcd. It can be used for creating highly redundant block
producer architectures, even across data centers.
nodeos-monitor
alternates a nodeos
process between two states,
"active" and "standby". Usually, an active node will be an EOS block
producer and a standby node will be a validator node. The nodeos
process is switched between states by being killed and restarted with
a new configuration. nodeos-monitor
tracks the nodeos
process as a
subprocess.
A nodeos
subprocess is deemed "active" when nodeos-monitor
can
achieve a distributed lock on an Etcd key. If the lock can't be
attained or if the lock is at some point lost, nodeos-monitor
switches nodeos
to the standby configuration set.
- Etcd cluster
An Etcd cluster is needed for distributed locking. If the failover group will cross data center boundaries, the Etcd cluster needs to span all data centers.
Use this guide for guidance on building out an Etcd cluster.
Use this guide for tuning an Etcd cluster. This is especially useful for clusters that span multiple datacenters, since Etcd must be tuned for high latencies.
On a machine with the Go runtime installed, run
$ go get -u github.com/activeeos/nodeos-monitor/cmd/nodeos-monitor
In the future, Github releases will be created.
nodeos-monitor provides failover for EOS nodes
Usage:
nodeos-monitor [flags]
Flags:
--active-config-dir string the directory containing the configs for an active nodeos process (default "/etc/nodeos-active-configs/")
--debug print debug logs
--etcd-ca string the Etcd CA to use
--etcd-cert string the Etcd client certificate
--etcd-endpoints stringArray the endpoints to Etcd (default [http://127.0.0.1:2379])
--etcd-key string the Etcd client key
--failover-group string the identifier for the group of nodes involved in the failover process (default "eos")
-h, --help help for nodeos-monitor
--log-format string log format (one of 'text' or 'json') (default "text")
--metrics-addr string where to expose the HTTP Prometheus metrics (default ":3000")
--nodeos string the path to the nodeos binary (default "/opt/eosio/bin/nodeos")
--nodeos-args stringArray additional arguments to pass to nodeos
--standby-config-dir string the directory containing the configs for a standby nodeos process (default "/etc/nodeos-standby-configs/")
nodeos-monitor` is configured via command line flags:
Here are the most useful options:
This is the directory containing a config.ini
file for the active
nodeos
process, usually a block producer.
This is the directory containing a config.ini
file for the standby
nodeos
process, usually a validator node.
A failover group is a unique descriptor for the block producer all
nodes are vying to be. This is the key that's used in building the
distributed lock on Etcd. All nodes must be configured with the
identical failover-group
setting.
The nodeos-monitor
process keep an instance of nodeos
as a
subprocess. However, it also needs monitoring in case it malfunctions
with something like systemd, Docker, or Kubernetes. Currently, there's
a docker-compose lab environment which should be useful for doing this
configuration--please see docker-compose.yaml
for details.
More configuration examples should be coming soon!