Check Managed Inference endpoint health

Check the status, success rate, and latency of a Managed Inference endpoint from the web interface or the CLI.

Check the health and performance of a Managed Inference endpoint from the web interface or the CLI.

Prerequisites

You need the following before you start:

A running Managed Inference Job with a serving endpoint. See Create a Managed Inference Job.
Access to the CosmicAC web interface, for the web method.
The CosmicAC CLI installed and configured, for the CLI method. See Install the CLI.

Steps

Run the health check

Use the web interface or the CLI. Both report every deployed endpoint.

In the left sidebar, open Model Health, then click Run health check.

Read the results

Each endpoint reports its health and performance:

Status shows Healthy, Degraded, or an error state.
Success rate shows the share of requests served successfully.
Traffic and Failures show the total requests handled and how many failed.
Avg latency and P95 latency show response times in milliseconds.
In flight shows the requests currently being processed.
Timestamp shows when CosmicAC last captured the metrics.

The CLI also shows Uptime and Uptime %, and prints a per-replica table so you can spot an unhealthy replica. The web interface shows a replica health count on each endpoint card.

The metrics come from the running vLLM service and reset when it restarts. They are not historical. For historical data, connect an observability stack such as Prometheus.

Check Managed Inference endpoint health

Prerequisites

Steps

Run the health check

Read the results

Next steps

On this page