etcd High Availability Cluster
This template deploys a highly available etcd cluster with 3 nodes for distributed consensus and configuration management.
What is etcd?
etcd is a distributed, reliable key-value store for the most critical data of a distributed system. It's used by:
- Kubernetes: For cluster configuration and service discovery
- Cloud Foundry: For application configuration
- CoreOS: For distributed system coordination
- Many others: As a configuration backend
Architecture
- 3-node cluster: Provides fault tolerance and high availability
- Raft consensus: Ensures data consistency across nodes
- Automatic leader election: Self-healing cluster
- Data persistence: Each node has dedicated storage
Connection Information
You can connect to any node in the cluster:
- etcd1: Port 2379 (client), 2380 (peer)
- etcd2: Port 2379 (client), 2380 (peer)
- etcd3: Port 2379 (client), 2380 (peer)
All nodes share the same data through replication.
Usage Examples
Using etcdctl
# Set a value
etcdctl put mykey "myvalue"
# Get a value
etcdctl get mykey
# List all keys
etcdctl get "" --prefix
# Check cluster health
etcdctl endpoint health --cluster
# List cluster members
etcdctl member list
Using HTTP API
# Put a key
curl http://etcd1:2379/v3/kv/put \
-X POST -d '{"key":"Zm9v","value":"YmFy"}'
# Get a key
curl http://etcd1:2379/v3/kv/range \
-X POST -d '{"key":"Zm9v"}'
Features
- High Availability: 3-node cluster survives single node failures
- Strong Consistency: Raft consensus algorithm ensures data accuracy
- Watch Support: Get notified of key changes in real-time
- Lease/TTL: Automatic key expiration
- Transaction Support: Atomic multi-key operations
- Authentication & RBAC: Secure access control (configurable)
Use Cases
- Service Discovery: Register and discover microservices
- Configuration Management: Centralized configuration store
- Distributed Locking: Coordinate distributed systems
- Leader Election: Automatic failover coordination
- Message Queue: Lightweight coordination primitives
Monitoring
Each node exposes metrics and health endpoints:
- Readiness Check:
http://etcd1:2379/readyz (is the node ready to serve traffic?)
- Liveness Check:
http://etcd1:2379/livez (does the node need a restart?)
- Health Check:
http://etcd1:2379/health (legacy, general health status)
- Metrics:
http://etcd1:2379/metrics (Prometheus metrics)
Health Check Endpoints (v3.4.29+)
/readyz - Check if ready to serve traffic (recommended for load balancers)
/livez - Check if process is alive (recommended for container orchestration)
/health - Legacy health check (available since v3.3.0)
Use ?verbose parameter for detailed check information:
curl http://etcd1:2379/readyz?verbose
# Output:
# [+]data_corruption ok
# [+]serializable_read ok
# [+]linearizable_read ok
# ok
Best Practices
- Use all endpoints: Configure clients with all node addresses for failover
- Monitor cluster health: Regularly check
/health endpoint
- Backup regularly: Use snapshot command for data backup
- Keep cluster odd-sized: 3, 5, or 7 nodes for proper quorum
- Watch resource usage: Monitor disk I/O and network latency
Documentation