CosmicAC Logo

Connect to a Managed Inference endpoint (vLLM)

Send a chat-completion request to a Managed Inference endpoint from the CLI or an OpenAI-compatible client.

Send a chat-completion request to a Managed Inference endpoint from the CLI or any OpenAI-compatible client.

Prerequisites

You need the following before you start:

Steps

Find your endpoint name

List your endpoints:

cosmicac models healthcheck

Each endpoint appears as Endpoint: <endpoint-name>. Copy the one you want to call.

Send a request

Use the CLI or any OpenAI-compatible client. Replace <endpoint-name> with the endpoint name from the previous step, and <api-key> with the key you created.

You need the API key only if you enabled Require Authorization header when you created the job. Otherwise, omit --api-key and the Authorization header.

cosmicac inference chat \
  --endpoint-id <endpoint-name> \
  --api-key <api-key> \
  --message "Hello"

The CLI reads the inference URL from your config. Omit --message for an interactive session, or add --stream for streaming output.

Next steps

On this page