Architecture de production
Keycloak en production necessite un clustering correct, un cache distribue performant, et une observabilite complete via metriques et logs structures.
Clustering Keycloak
# Mode production avec clustering
/opt/keycloak/bin/kc.sh start \
--hostname=auth.mondomaine.com \
--db=postgres \
--db-url=jdbc:postgresql://db:5432/keycloak \
--db-username=keycloak \
--db-password=secret \
--cache=ispn \
--cache-config-file=cache-ispn.xml \
--http-enabled=false \
--proxy-headers=xforwarded
# Points cles :
# - Toujours utiliser --http-enabled=false avec un reverse proxy TLS
# - --proxy-headers=xforwarded pour que Keycloak lise les headers X-Forwarded
# - --cache=ispn active le cache distribue Infinispan
Configuration du cache Infinispan
<!-- cache-ispn.xml personnalise -->
<infinispan>
<jgroups>
<!-- Decouverte via DNS pour Kubernetes -->
<stack name="kubernetes">
<TCP bind_port="7800"/>
<dns.DNS_PING
dns_query="keycloak-headless.keycloak.svc.cluster.local"/>
</stack>
</jgroups>
<cache-container>
<!-- Sessions : repliquees entre tous les noeuds -->
<replicated-cache name="sessions">
<expiration max-idle="900000"/> <!-- 15 min -->
</replicated-cache>
<!-- Authorizations : distribue pour les performances -->
<distributed-cache name="authorizationCache"
owners="2">
<expiration max-idle="300000"/> <!-- 5 min -->
</distributed-cache>
<!-- Realms : cache local avec invalidation -->
<invalidation-cache name="realms">
<expiration max-idle="-1"/>
</invalidation-cache>
</cache-container>
</infinispan>
Metriques Prometheus
# Activer les metriques
/opt/keycloak/bin/kc.sh start \
--metrics-enabled=true \
--health-enabled=true
# Endpoints disponibles
GET /metrics # Metriques Prometheus
GET /health # Health check global
GET /health/live # Liveness probe
GET /health/ready # Readiness probe
# Metriques cles a surveiller
keycloak_logins_total # Total connexions
keycloak_failed_login_attempts_total # Tentatives echouees
keycloak_registrations_total # Inscriptions
keycloak_active_sessions # Sessions actives
http_server_requests_seconds_bucket # Latence HTTP
jvm_memory_used_bytes # Memoire JVM
jvm_gc_pause_seconds_sum # Pauses GC
Configuration Prometheus + Grafana
# prometheus.yml
scrape_configs:
- job_name: keycloak
metrics_path: /metrics
static_configs:
- targets:
- keycloak-0:8080
- keycloak-1:8080
- keycloak-2:8080
scrape_interval: 15s
# Alertes recommandees (alertmanager)
groups:
- name: keycloak
rules:
- alert: HighFailedLogins
expr: rate(keycloak_failed_login_attempts_total[5m]) > 10
for: 2m
labels:
severity: warning
- alert: KeycloakDown
expr: up{job="keycloak"} == 0
for: 1m
labels:
severity: critical
Important : Configurez toujours des alertes sur les tentatives de login echouees. Un pic soudain peut indiquer une attaque par brute force.