Skip to main content

Monitoring

Several of Kargo's long-running components expose Prometheus metrics through the underlying controller-runtime metrics server:

  • The controller
  • The management controller
  • The (internal) webhooks server

By default, each component's metrics server is disabled. The Kargo Helm chart can enable it, expose the metrics through a Service, and -- for users of the Prometheus Operator -- create a matching ServiceMonitor.

info

For complete parameter documentation, refer to the chart documentation.

Enabling Metrics

Setting a component's metrics.enabled flag to true does three things:

  1. Binds the component's metrics server (by setting METRICS_BIND_ADDRESS).
  2. Declares a metrics container port on the component's pods.
  3. Creates a metrics Service that targets that port.

For example, to expose metrics for all three supported components:

controller:
metrics:
enabled: true
managementController:
metrics:
enabled: true
webhooksServer:
metrics:
enabled: true

Each component's metrics are served over plain HTTP on port 9090 by default, named http-metrics; scrapers do not need to be configured for TLS. The port name and number are configurable per component, for example:

controller:
metrics:
enabled: true
service:
servicePort: 8080
portName: telemetry
note

The default port name (http-metrics) carries an http- prefix so that service meshes that infer a port's protocol from its name treat it as HTTP. If you rename the port for a meshed cluster, keep the http- prefix.

Scraping With the Prometheus Operator

If your cluster runs the Prometheus Operator, Kargo can create a ServiceMonitor for each component. Enable it alongside metrics.enabled:

controller:
metrics:
enabled: true
serviceMonitor:
enabled: true
# Often required so your Prometheus instance selects the ServiceMonitor.
additionalLabels:
release: prometheus
interval: 30s

The ServiceMonitor is only rendered when both metrics.enabled and metrics.serviceMonitor.enabled are true and the Prometheus Operator CRDs (monitoring.coreos.com/v1) are present in the cluster. If the CRDs are absent, the ServiceMonitor is silently skipped so that installation does not fail.

info

A ServiceMonitor does not scrape the Service's cluster IP. The Prometheus Operator discovers the Service's endpoints and scrapes each backing pod individually, so per-replica metrics are preserved even when a component runs more than one replica.

Additional serviceMonitor fields are available for tuning the scrape, such as scheme, tlsConfig, relabelings, metricRelabelings, and namespace. Refer to the chart documentation for the full list.

Scraping Without the Prometheus Operator

If you collect metrics with a tool that performs its own endpoint discovery (for example, an annotation-based scrape config), enable metrics.enabled and point your scraper at the metrics Service.

A headless Service is often convenient in this case, since it resolves directly to individual pod IPs. Set clusterIP to None:

controller:
metrics:
enabled: true
service:
clusterIP: "None"

You can also attach annotations and labels to the metrics Service to drive your scraper's discovery:

controller:
metrics:
enabled: true
service:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"