Prometheus Metrics & Grafana¶

The SDK ships four pre-built Grafana dashboards and a Prometheus scrape pipeline. Everything is auto-provisioned inside the all-in-one container.

Architecture¶

instrumentation-sdk
        │
        ▼
  FastAPI REST API ──► Tempo  (traces)
        │
        ▼
   Prometheus ──────► Grafana
  localhost:9090        localhost:3002

Quick Start¶

llm-observe start
open http://localhost:3002   # admin / admin

Navigate to Dashboards → LLM Observability to see all four dashboards.

Dashboards¶

1. LLM Latency & TTFT¶

http://localhost:3002/d/llm-latency-ttft-dashboard

Panel	What it shows
Latency Percentiles (p50/p95/p99)	End-to-end call latency by model
TTFT Percentiles	Time to first token for streaming
Avg Latency vs Avg TTFT	Bar chart comparison per model

2. LLM Cost Dashboard¶

http://localhost:3002/d/llm-cost-dashboard

Panel	What it shows
Cumulative Cost Over Time	USD cost per service + model
Cost Distribution by Model	Donut chart
Cost Distribution by Service	Horizontal bar chart

3. LLM Error & Retry¶

http://localhost:3002/d/llm-error-retry-dashboard

Panel	What it shows
Success vs Error Rate	Stacked time series
Finish Reason Distribution	`stop`, `length`, `content_filter`
Retry Rate (%)	Gauge

4. LLM Security & Safety¶

http://localhost:3002/d/llm-security-safety-dashboard

Panel	What it shows
PII Detection Rate	Detections/sec by service
Injection Attempts Rate	Attempts/sec by service
Total Security Violations	Cumulative count

Screenshots¶

Prometheus Metrics Prometheus scraping LLM metrics every 5 seconds

Distributed Tracing Grafana Tempo showing distributed traces from LLM calls

Loki Logs Dashboard Structured logs aggregated via Loki

Initialize the Metrics Pipeline¶

curl -X POST http://localhost:8002/v1/metrics/init \
  -H "Content-Type: application/json" \
  -d '{"port": 9464}'

{"initialized": true, "message": "Metrics pipeline initialized"}

curl http://localhost:8002/v1/metrics/health

{"initialized": true, "message": "Metrics pipeline is active"}

Record a Span Manually¶

curl -X POST http://localhost:8002/v1/metrics/record \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o", "provider": "openai", "service_name": "chat-api",
    "prompt_tokens": 150, "completion_tokens": 80,
    "latency_ms_total": 420, "latency_ms_ttft": 95,
    "finish_reason": "stop", "status": "success",
    "pii_detected": false, "injection_attempt": false, "retry_count": 0
  }'

{"recorded": true, "cost_usd_micro": 1950, "price_version": "2025-01-15"}

Prometheus Metrics Reference¶

Scraped from http://localhost:9464/metrics every 5 seconds.

Metric	Type	Labels
`llm_tokens_total`	Counter	`model`, `provider`, `service_name`, `token_type`
`llm_cost_usd_micro_total`	Counter	`model`, `provider`, `service_name`
`llm_latency_ms_total`	Histogram	`model`, `provider`, `service_name`
`llm_latency_ms_ttft`	Histogram	`model`, `provider`, `service_name`
`llm_pii_detected_total`	Counter	`service_name`
`llm_injection_attempts_total`	Counter	`service_name`
`llm_finish_reason_total`	Counter	`model`, `provider`, `service_name`, `finish_reason`
`llm_spans_total`	Counter	`model`, `provider`, `service_name`, `status`, `has_retries`

Add a New Model Price¶

Edit config/model_prices.yaml:

- model: gpt-5
  provider: openai
  input_price_per_1m: 10.00
  output_price_per_1m: 30.00
  version: "2026-01-01"

Then restart:

docker restart instrumentation-sdk-api

Dashboard Hot-Reload¶

File changed	Action
`config/model_prices.yaml`	Restart container
`config/patterns.yaml`	Restart container
`build/dashboards/*.json`	Auto hot-reloaded ✅ (every 30s)