Skip to content

Manual Spans — Context Manager

Use context managers when you need dynamic span properties or need to set metadata after the LLM responds (such as actual tokens consumed, custom session IDs, or embedding vectors).


Token Pre-Counting & Logging Pipeline

When using llm_span_with_tokens, the SDK performs token counting before making the outbound LLM call:

[Prompt Passed to Context Manager]
      [Tiktoken Matcher] ──(Fails/Unknown Model)──► [Estimate (Char Heuristic)]
                │                                             │
                ├──(Success)                                  │
                ▼                                             ▼
       [Exact Token Count]                           [Heuristic Token Count]
                │                                             │
                └───────────────┬─────────────────────────────┘
                   [Span Metadata Recorded]
                    - prompt_tokens
                    - token_count_method
                      [Outbound LLM Call]

llm_span — Basic Usage

The standard context manager lets you set metadata dynamically using span.set_metadata(key, value).

from instrumentation_sdk import llm_span

async def handle_request(user_id: str, prompt: str):
    async with llm_span(model="gpt-4o", provider="openai", user_id=user_id) as span:
        # Outbound call to LLM
        response = await client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}]
        )

        # Log details retrieved from the response object
        span.set_metadata("prompt_tokens", response.usage.prompt_tokens)
        span.set_metadata("completion_tokens", response.usage.completion_tokens)
        span.set_metadata("actual_model", response.model)

    return response.choices[0].message.content

Metadata Keys

set_metadata accepts any string key. Standard dashboard keys are:

Key Type Description
prompt_tokens int Input token count
completion_tokens int Output token count
actual_model str Exact model name returned by the provider
session_id str Logical conversation/thread grouping
finish_reason str e.g. stop, length, content_filter
retry_count int Number of attempts made

llm_span_with_tokens — Automatic Token Pre-Counting

Use llm_span_with_tokens to count prompt tokens locally before the LLM call is executed. This records prompt_tokens and token_count_method automatically.

from instrumentation_sdk import llm_span_with_tokens

async def handle(prompt: str):
    async with llm_span_with_tokens(
        model="gpt-4o",
        provider="openai",
        prompt=prompt  # String prompt passed directly
    ) as span:
        response = await client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": prompt}]
        )
        # prompt_tokens is already computed — just log completion tokens
        span.set_metadata("completion_tokens", response.usage.completion_tokens)

Chat Message Support

llm_span_with_tokens handles both plain strings and structured message lists:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain transformers in a sentence."}
]

async with llm_span_with_tokens(
    model="gpt-4o",
    provider="openai",
    prompt=messages  # Supported automatically
) as span:
    ...

Synchronous Context Manager

If you are not using asyncio, both context managers can be used in synchronous blocks without the async prefix:

from instrumentation_sdk import llm_span

with llm_span(model="gpt-4o", provider="openai") as span:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    span.set_metadata("prompt_tokens", response.usage.prompt_tokens)

Error Handling

If an exception occurs within the context block, the span is automatically finalized, marked with status="error", and reported before the exception bubbles up:

async with llm_span(model="gpt-4o", provider="openai") as span:
    # This will trigger an error span to be emitted
    raise TimeoutError("Provider failed to respond")

Nested Context Managers

Context managers can be nested to capture hierarchical multi-agent or router-worker topologies:

async with llm_span(model="router-v2", service_name="agent-router") as parent:
    # Router logic
    async with llm_span(model="gpt-4o", service_name="writer-agent") as child:
        # Worker logic
        response = await client.chat.completions.create(...)
        child.set_metadata("completion_tokens", response.usage.completion_tokens)

Token Counting Endpoint

You can query the SDK's token counting engine directly via REST:

curl -X POST http://localhost:8002/v1/token-counting/count \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello world!", "model": "gpt-4o"}'

Response:

{"tokens": 3, "method": "tiktoken"}


Next Steps