REST Management API¶
Full reference for all HTTP endpoints exposed by the observability container. By default, the API is served at http://localhost:8002/v1.
Endpoint Feature Map¶
FastAPI REST API (localhost:8002)
│
├── /instrumentation
│ ├── POST /init → Enable monkey-patching
│ ├── POST /uninstrument → Remove active patches
│ ├── POST /detect → Discover provider/model
│ └── POST /test-call → Verify trace output
│
├── /token-counting
│ └── POST /count → Local token evaluation
│
├── /streaming
│ └── POST /test-stream-call → Server-Sent Events test
│
├── /pii-injection
│ └── POST /scan → Aho-Corasick & regex match
│
├── /sampling
│ └── POST /should-sample → Evaluate modulo gate
│
├── /embeddings
│ └── POST /embed → Vector conversion
│
└── /metrics
├── POST /init → Start scrape endpoint
├── GET /health → Check metrics status
├── POST /record → Log single span
└── POST /record-batch → Log bulk spans
1. Instrumentation Management¶
POST /instrumentation/init¶
Enable auto-instrumentation globally in the application runtime. - Request Body: None - Response (application/json):
POST /instrumentation/uninstrument¶
Remove all active auto-instrumentation monkey-patches. - Request Body: None - Response (application/json):
POST /instrumentation/detect¶
Parse a sample request body to discover the provider name and model. - Request Body:
- Response:POST /instrumentation/test-call¶
Trigger an outbound call to verify metrics and tracing flow. - Request Body:
Allowedmethod values: httpx, requests, sdk. Allowed provider values: openai, anthropic. - Response: 2. Utility Engine¶
POST /token-counting/count¶
Count prompt tokens locally without contacting the LLM provider. - Request Body (Plain String):
- Request Body (Chat Messages):{
"prompt": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize this document."}
],
"model": "gpt-4o"
}
POST /pii-injection/scan¶
Scan inputs for PII and injection exploits. - Request Body:
- Response:POST /embeddings/embed¶
Convert text into a 384-dimensional MiniLM-L6-v2 vector embedding. - Request Body:
- Response:POST /sampling/should-sample¶
Check if a span ID passes the 1% deterministic modulo-100 gate. - Request Body:
- Response:3. Prometheus Metrics & Records¶
POST /metrics/init¶
Initialize the local Prometheus scraper endpoint. - Request Body:
- Response:POST /metrics/record¶
Record variables for a single span to generate metrics and compute USD pricing. - Request Body:
{
"model": "gpt-4o",
"provider": "openai",
"service_name": "chat-api",
"prompt_tokens": 120,
"completion_tokens": 60,
"latency_ms_total": 380,
"latency_ms_ttft": 90,
"finish_reason": "stop",
"status": "success",
"pii_detected": false,
"injection_attempt": false,
"retry_count": 0
}
Next Steps¶
- Docker & CLI Deployment - Run and expose the API container.
- Config Files Reference - Learn how prices and regex patterns are configured.