Architecture du monitoring
Architecture de monitoring sur Kubernetes :
+------------------------------------------------------------------+
| CLUSTER KUBERNETES |
| |
| +----------+ +----------+ +----------+ +----------+ |
| | App Pod | | App Pod | | Keycloak | | GitLab | |
| | /metrics | | /metrics | | /metrics | | /metrics | |
| +----+-----+ +----+-----+ +----+-----+ +----+-----+ |
| | | | | |
| +-------+------+-------+------+-------+------+ |
| | | | |
| v v v |
| +----------------------------------------------+ |
| | PROMETHEUS (collecte) | |
| | - Scrape les endpoints /metrics | |
| | - Stocke les metriques (TSDB) | |
| | - Evalue les regles d'alerte | |
| +----------------------------------------------+ |
| | | |
| v v |
| +----------------+ +------------------+ |
| | ALERTMANAGER | | GRAFANA | |
| | - Routes | | - Dashboards | |
| | - Email, Slack | | - Visualisation | |
| | - PagerDuty | | - Alertes visuelles| |
| +----------------+ +------------------+ |
| |
| +----------------------------------------------+ |
| | LOKI (logs centralises) | |
| | - Collecte les logs de tous les pods | |
| | - Requetes LogQL | |
| | - Integre dans Grafana | |
| +----------------------------------------------+ |
+------------------------------------------------------------------+
Deployer la stack monitoring avec Helm
# Installer kube-prometheus-stack (Prometheus + Grafana + Alertmanager)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword="admin-password" \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
# Installer Loki pour les logs
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
--namespace monitoring \
--set promtail.enabled=true \
--set loki.persistence.enabled=true
Metriques d'application (instrumentation)
# Exposer des metriques dans votre application (Python + prometheus_client)
from prometheus_client import Counter, Histogram, start_http_server
# Definir les metriques
REQUEST_COUNT = Counter(
'app_requests_total',
'Total des requetes HTTP',
['method', 'endpoint', 'status']
)
REQUEST_LATENCY = Histogram(
'app_request_duration_seconds',
'Duree des requetes HTTP',
['method', 'endpoint']
)
# Instrumenter le code
@app.route('/api/users')
def get_users():
with REQUEST_LATENCY.labels('GET', '/api/users').time():
users = db.get_users()
REQUEST_COUNT.labels('GET', '/api/users', '200').inc()
return jsonify(users)
# Demarrer le serveur de metriques sur le port 9090
start_http_server(9090) # Accessible sur /metrics
Les 4 Golden Signals de Google : Latence, trafic, erreurs, saturation. Ces 4 metriques suffisent pour surveiller n'importe quel service.