AI HubReading Usage and Cost

Reading Usage and Cost

Every successful AI Hub response carries two things you usually need: how many tokens were consumed, and how much the request cost. This page shows where they live so you can wire them into dashboards, set per-user budgets, or just sanity-check pricing as you experiment.

Two ways to get the data

SourceWhat you getBest for
Response headersCost in USD, key spend so far, call IDFastest — no body parsing
Response body usageTokens, prompt cache breakdown, reasoning tokensSame place the OpenAI SDK looks

You don’t have to choose — every successful response has both.

Reading the headers

Every response includes a set of headers prefixed with x-litellm-:

HeaderMeaning
x-litellm-response-costUSD cost of this request, as a float
x-litellm-key-spendTotal USD spent by this API key so far
x-litellm-call-idPer-request ID, useful for support and tracing
x-litellm-model-idInternal model deployment ID
x-litellm-model-groupThe public model name you requested

curl

curl -i https://hnd1.aihub.zeabur.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_ZEABUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 32
  }'

The -i flag prints the response headers — look for the x-litellm-response-cost line.

Python

The official OpenAI SDK does not expose response headers to user code by default. If you need cost out of the headers, drop down to httpx:

import httpx
 
r = httpx.post(
    "https://hnd1.aihub.zeabur.ai/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_ZEABUR_API_KEY"},
    json={
        "model": "claude-sonnet-4-5",
        "messages": [{"role": "user", "content": "Hello"}],
        "max_tokens": 32,
    },
    timeout=60.0,
)
print("cost (USD):", r.headers.get("x-litellm-response-cost"))
print("usage:", r.json()["usage"])

Reading the body usage object

The response body’s usage object is OpenAI-compatible. For Anthropic and OpenAI models, extra fields appear when relevant:

{
  "usage": {
    "prompt_tokens": 1024,
    "completion_tokens": 256,
    "total_tokens": 1280,
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0
    },
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0
  }
}
FieldModelsMeaning
prompt_tokens / completion_tokens / total_tokensAllStandard OpenAI fields
prompt_tokens_details.cached_tokensOpenAIPortion of prompt_tokens served from prompt cache — already counted inside prompt_tokens
cache_creation_input_tokensAnthropicTokens written into Anthropic’s prompt cache on this request
cache_read_input_tokensAnthropicTokens read back from Anthropic’s prompt cache
completion_tokens_details.reasoning_tokensReasoning models (e.g. gpt-5, gpt-5-mini)Tokens spent on internal reasoning
💡

Anthropic and OpenAI count cached tokens differently. Anthropic’s cache_creation_input_tokens and cache_read_input_tokens are reported outside prompt_tokens. OpenAI’s cached_tokens is a subset of prompt_tokens. The body shows both shapes faithfully — pick the field that matches the provider you called.

Streaming responses

When you stream, the usage object only appears in a final chunk if you opt in. Set stream_options.include_usage: true in the request:

import httpx, json
 
with httpx.stream(
    "POST",
    "https://hnd1.aihub.zeabur.ai/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_ZEABUR_API_KEY"},
    json={
        "model": "claude-sonnet-4-5",
        "messages": [{"role": "user", "content": "Hello"}],
        "max_tokens": 32,
        "stream": True,
        "stream_options": {"include_usage": True},
    },
    timeout=60.0,
) as r:
    final_usage = None
    for line in r.iter_lines():
        if not line.startswith("data: "):
            continue
        payload = line[len("data: "):]
        if payload == "[DONE]":
            break
        chunk = json.loads(payload)
        if chunk.get("usage"):
            final_usage = chunk["usage"]
        # ...your existing token-by-token handling...
    print("final usage:", final_usage)

Without stream_options.include_usage, the streaming response carries no usage object at all.

Tracing a specific call

To investigate a particular request after the fact — for example, when diagnosing an unexpected response, reconciling a charge, or correlating with application-side logs — use the completion ID returned in the response body as id:

resp = client.chat.completions.create(...)
print("request id:", resp.id)
# chatcmpl-715764bf-675d-4d95-bbdc-80c037c8ce3f
⚠️

Always use response.id. Do not use the x-litellm-call-id header — that value is an internal trace identifier used during request routing and is not persisted to spend logs, so lookups by it will yield no result.

In the Zeabur Dashboard, open the AI Hub page, expand the relevant daily spend log, and click the info icon on any row. The detail dialog displays the Request ID alongside a copy action; this value is identical to response.id.

Cache-served responses carry a derived suffix

When LiteLLM serves an entire response from its response cache (indicated by a “served from cache” badge in the Dashboard and cost = 0), the recorded ID is the original ID with a _cache_hit<timestamp> suffix appended:

chatcmpl-2ec370dd-83a6-41b4-817c-1646c57034bc_cache_hit1779350469.4126954

The original request and the cache-served replay are recorded as separate entries in the spend log, sharing the underlying UUID prefix. This allows aggregation by prefix to determine how many times a given prompt was served from cache.

Historical usage

For per-day totals and a searchable log of every call, use the AI Hub page on the Zeabur Dashboard. The headers and body fields above are for in-process accounting; the Dashboard is for after-the-fact inspection.