etcd High Availability Cluster

This template deploys a highly available etcd cluster with 3 nodes for distributed consensus and configuration management.

What is etcd?

etcd is a distributed, reliable key-value store for the most critical data of a distributed system. It's used by:

Kubernetes: For cluster configuration and service discovery
Cloud Foundry: For application configuration
CoreOS: For distributed system coordination
Many others: As a configuration backend

Architecture

3-node cluster: Provides fault tolerance and high availability
Raft consensus: Ensures data consistency across nodes
Automatic leader election: Self-healing cluster
Data persistence: Each node has dedicated storage

Connection Information

You can connect to any node in the cluster:

etcd1: Port 2379 (client), 2380 (peer)
etcd2: Port 2379 (client), 2380 (peer)
etcd3: Port 2379 (client), 2380 (peer)

All nodes share the same data through replication.

Usage Examples

Using etcdctl

# Set a value
etcdctl put mykey "myvalue"

# Get a value
etcdctl get mykey

# List all keys
etcdctl get "" --prefix

# Check cluster health
etcdctl endpoint health --cluster

# List cluster members
etcdctl member list

Using HTTP API

# Put a key
curl http://etcd1:2379/v3/kv/put \
  -X POST -d '{"key":"Zm9v","value":"YmFy"}'

# Get a key
curl http://etcd1:2379/v3/kv/range \
  -X POST -d '{"key":"Zm9v"}'

Features

High Availability: 3-node cluster survives single node failures
Strong Consistency: Raft consensus algorithm ensures data accuracy
Watch Support: Get notified of key changes in real-time
Lease/TTL: Automatic key expiration
Transaction Support: Atomic multi-key operations
Authentication & RBAC: Secure access control (configurable)

Use Cases

Service Discovery: Register and discover microservices
Configuration Management: Centralized configuration store
Distributed Locking: Coordinate distributed systems
Leader Election: Automatic failover coordination
Message Queue: Lightweight coordination primitives

Monitoring

Each node exposes metrics and health endpoints:

Readiness Check: http://etcd1:2379/readyz (is the node ready to serve traffic?)
Liveness Check: http://etcd1:2379/livez (does the node need a restart?)
Health Check: http://etcd1:2379/health (legacy, general health status)
Metrics: http://etcd1:2379/metrics (Prometheus metrics)

Health Check Endpoints (v3.4.29+)

/readyz - Check if ready to serve traffic (recommended for load balancers)
/livez - Check if process is alive (recommended for container orchestration)
/health - Legacy health check (available since v3.3.0)

Use ?verbose parameter for detailed check information:

curl http://etcd1:2379/readyz?verbose
# Output:
# [+]data_corruption ok
# [+]serializable_read ok
# [+]linearizable_read ok
# ok

Best Practices

Use all endpoints: Configure clients with all node addresses for failover
Monitor cluster health: Regularly check /health endpoint
Backup regularly: Use snapshot command for data backup
Keep cluster odd-sized: 3, 5, or 7 nodes for proper quorum
Watch resource usage: Monitor disk I/O and network latency

Características

Recursos

Comunidad

Precios

Etcd Cluster (developing)

High availability etcd cluster with 3 nodes for distributed key-value store and service discovery. Provides consensus, configuration management, and distributed coordination for cloud-native applications.