Monitoring & Alerts
Built-in monitoring with no external services required. Voxeltron collects CPU, memory, and network metrics from every Docker container, aggregates logs, and fires alerts when thresholds are breached.
Metrics Collection
The MetricsCollector polls the Docker stats API at a configurable interval, capturing per-container resource usage. Data is persisted in the MetricsStore (backed by SQLite) for historical queries and dashboard rendering.
Collected Metrics
- CPU — usage percentage per core and aggregate
- Memory — RSS, cache, swap, and limit
- Network — bytes and packets in/out per interface
- Disk I/O — read/write bytes and operations
- Container state — running, stopped, restarting, OOM-killed
Log Aggregation
Container logs are streamed in real time via the Docker API and indexed for fast retrieval. Logs are queryable by deployment, time range, and level (debug, info, warn, error).
# Query logs via the TUI
voxeltron logs my-app --since 1h --level error
# Stream live logs
voxeltron logs my-app --follow Alerting
Threshold-based alerts fire when resource usage exceeds configured limits. Each alert rule targets a specific metric and can route notifications to one or more channels.
CPU Alert
Fires when CPU usage exceeds a threshold (e.g. > 90%) for a sustained duration.
Memory Alert
Fires when memory usage exceeds a threshold (e.g. > 85%) to catch leaks before OOM.
Error Rate Alert
Fires when the error log rate exceeds a count per window (e.g. > 50 errors/min).
Disk Alert
Fires when disk usage crosses a threshold to prevent storage exhaustion.
Alert Channels
- Webhook — POST JSON payloads to any HTTP endpoint
- Email — SMTP-based notifications with configurable recipients
- Slack — delivered via WASM plugins for full customization
Configuration
Configure monitoring in /etc/voxeltron/config.toml under the [monitoring] section:
[monitoring]
enabled = true
interval_seconds = 15 # Metrics polling interval
retention_days = 30 # How long to keep historical data
log_level_filter = "info" # Minimum log level to index
[monitoring.alerts]
enabled = true
evaluation_interval = "60s" # How often alert rules are evaluated
[[monitoring.alerts.rules]]
name = "high-cpu"
metric = "cpu_percent"
threshold = 90.0
duration = "5m"
channels = ["webhook", "email"]
[[monitoring.alerts.rules]]
name = "high-memory"
metric = "memory_percent"
threshold = 85.0
duration = "2m"
channels = ["webhook"]
[[monitoring.alerts.rules]]
name = "error-spike"
metric = "error_rate"
threshold = 50.0 # errors per minute
duration = "1m"
channels = ["webhook", "email"]
[monitoring.alerts.channels.webhook]
url = "https://hooks.example.com/voxeltron"
[monitoring.alerts.channels.email]
smtp_host = "smtp.example.com"
smtp_port = 587
from = "alerts@example.com"
to = ["ops@example.com"] AI Integration
The built-in AI DevOps agent can interact with the monitoring subsystem through dedicated tools:
- query_metrics — retrieve CPU, memory, and network metrics for a deployment over a time range
- query_logs — search and filter container logs by deployment, level, and time window
- list_alerts — enumerate active and resolved alerts, including firing status and history
These tools allow the AI agent to diagnose issues, correlate metrics with log events, and surface actionable insights without manual investigation.