Installation & Quick Start¶
Get your first span captured in under 5 minutes.
Install¶
How It Fits Together¶
pip install instrumentation-sdk
│
▼
Your Python App
+ one line: init_auto_instrumentation()
│
▼
instrumentation-sdk
│
┌────┴────┐
▼ ▼
API Grafana
localhost:8002 localhost:3002
Option A — Auto-Instrumentation (Zero Code Changes)¶
Drop this at the top of your app. Every OpenAI / Anthropic / LiteLLM / LangChain call is tracked automatically.
from instrumentation_sdk import init_auto_instrumentation
init_auto_instrumentation() # call once, at startup
# From here, all LLM calls are tracked — no other changes needed
import openai
client = openai.AsyncOpenAI()
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
Best for
Existing apps where you don't want to change any LLM call code.
Option B — Manual Span (Decorator)¶
from instrumentation_sdk import llm_observe
@llm_observe(service="my-app", endpoint="chat")
async def ask_llm(prompt: str):
# your existing LLM code here
return response
Best for
When you want to tag specific functions with a service name and endpoint.
Option C — Manual Span (Context Manager)¶
from instrumentation_sdk import llm_span
async with llm_span(model="gpt-4o", provider="openai") as span:
response = await client.chat.completions.create(...)
span.set_metadata("prompt_tokens", response.usage.prompt_tokens)
Best for
When you need to attach token counts or custom metadata from the actual response.
Start the Observability Stack¶
This launches the all-in-one container:
| Service | URL |
|---|---|
| FastAPI REST API | http://localhost:8002 |
| Grafana Dashboards | http://localhost:3002 |
| Prometheus | http://localhost:9090 |
| OTLP / Tempo | localhost:4317 |
Verify It's Working¶
1. Trigger a test span¶
curl -X POST http://localhost:8002/v1/instrumentation/test-call \
-H "Content-Type: application/json" \
-d '{"method": "httpx", "provider": "openai"}'
Expected response:
2. Check Grafana¶
Open http://localhost:3002 → LLM Observability folder → pick any dashboard. Spans appear within 5–10 seconds.
Grafana showing LLM latency, token usage and cost metrics
Supported LLM Providers¶
| Provider | Client | What's patched |
|---|---|---|
| OpenAI | openai.AsyncOpenAI / openai.OpenAI | chat.completions.create |
| Anthropic | anthropic.AsyncAnthropic / anthropic.Anthropic | messages.create |
| LiteLLM | litellm module | acompletion / completion |
| LangChain | Any BaseChatModel subclass | ainvoke / invoke |
| Generic HTTP | httpx, requests | Any call matching known LLM URLs |
Next Steps¶
- Auto-Instrumentation — detailed provider config
- Manual Spans — Context Manager — set metadata mid-call
- Docker & CLI Deployment — production deployment