Changelog¶
All notable changes to the instrumentation-sdk package will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[1.8.2] - 2026-05-23¶
Added¶
- Multi-Model Fallback Chain Tracking: Correlates multiple retry attempts under a single request trace by passing
attempted_modelsandretry_countthrough the SDK manual span enrichment pipeline and the REST API span ingestion endpoint. - Tool-Call Chain Linking: Expose
/v1/tool-call/trackand/v1/tool-call/clearAPI endpoints. Intercept 'tool_calls' finish reason to emit intermediate transition spans and track cumulative trace cost. - Optimization of Kafka Reporter WAL: Offloaded Protobuf serialization to a background thread to prevent thread contention, and optimized local SQLite Write-Ahead Log (WAL) fallback writing.
- OTel Performance Benchmarking: Added profiling tools to benchmark memory/CPU footprints of the telemetry pipeline.
Fixed¶
- Docker Startup Delay: Delay downstream services startup in all-in-one
entrypoint.shto avoid resource contention during Grafana database migrations.
[1.8.1] - 2026-05-22¶
Fixed¶
- Bumped version to 1.8.1 to resolve PyPI package release collision.
[1.7.0] - 2026-05-21¶
Added¶
- MiniLM Embedding Integration: Asynchronous, non-blocking fetch of MiniLM embeddings for prompts concurrently with span finalization using
asyncio.create_task(). - REST API Endpoint: Exposed
POST /v1/embeddings/embedto generate embeddings for arbitrary text via the embedding worker. - Contract-First Support (Embedding):
- OpenAPI: Added
/embeddings/embedendpoint tov1.yaml. - GraphQL: Added
getEmbeddingquery andGetEmbeddingPayloadtov1.graphql. - Protobuf: Added
GetEmbeddingRPC, request, and response message schemas toinstrumentation.proto.
[1.6.0] - 2026-05-21¶
Added¶
- Deterministic Sampling Gate: Hashing of
span_idusing SHA256 modulo 100 to determine if span should be sampled (is_sampled = True). Prevents expensive operations (hashing/embeddings) when unsampled. - REST API Endpoint: Exposed
POST /v1/sampling/should-sampleto check if a span should be sampled. - Contract-First Support (Sampling):
- OpenAPI: Added
/sampling/should-samplepath definition tov1.yaml. - GraphQL: Added
shouldSamplequery andSamplingGatePayloadstructure tov1.graphql. - Protobuf: Added
ShouldSampleRPC, request, and response message schemas toinstrumentation.proto. - Performance Load Test Suite (
tests/performance/test_metrics_load.py): 7 pytest cases covering 100 individual spans, 10×50 batch spans, mixed error ratios, PII/injection flags, all model/provider combos, and high token counts. Marked@pytest.mark.performance— excluded from unit/integration CI by default. - Grafana Config CI (
.github/workflows/grafana-config-validate.yml): File-targeted CI that triggers only on changes to these exact 10 files:grafana-datasource.yaml,grafana-dashboard-provider.yaml,prometheus.yml,tempo-config.yaml, all 4 dashboard JSON files,model_prices.yaml, andpatterns.yaml. Validates YAML syntax, required fields, dashboard UID uniqueness, regex compilability, and price-entry integrity. No Docker — runs in ~60 s. - Prometheus Metrics collection & scraping: Integrated OpenTelemetry Prometheus adapter to collect operational metrics from LLM call lifecycles.
- REST Metrics API: Exposed endpoints
POST /v1/metrics/init,GET /v1/metrics/health,POST /v1/metrics/record, andPOST /v1/metrics/record-batchfor metrics orchestration. - Grafana Dashboard: Provisioned Grafana dashboard visualizing LLM latency, TTFT, token usage, cost, and error rates.
- Contract-First Support (Metrics):
- OpenAPI: Added OpenAPI routes and schemas for the metrics endpoints.
- GraphQL: Exposed
initMetrics,recordMetrics,recordMetricsBatchmutations andmetricsHealthquery. - Protobuf: Added
InitMetrics,GetMetricsHealth,RecordMetrics,RecordMetricsBatchRPCs toInstrumentationControlService.
Fixed¶
- Grafana "database is locked" crash: Set
GF_DATABASE_WAL=truepermanently inentrypoint.sh. Root cause: a manual debug run had initialised the SQLite DB in WAL mode; subsequent container starts without WAL=true caused the migration service to crash silently, leaving port 3000 unreachable. - Proto buf lint failures: Split shared
MetricsStatusResponseintoInitMetricsResponseandGetMetricsHealthResponseso each RPC has a uniquely named response type — satisfyingbuf lintnaming and reuse rules.
Runbook — Config File Changes Require Container Restart¶
| File changed | Action required |
|---|---|
config/model_prices.yaml | Restart container: docker restart instrumentation-sdk-api |
config/patterns.yaml | Restart container: docker restart instrumentation-sdk-api |
build/grafana-datasource.yaml | Restart container |
build/grafana-dashboard-provider.yaml | Restart container |
build/prometheus.yml | Restart container |
build/tempo-config.yaml | Restart container |
build/dashboards/*.json | Hot-reloaded by Grafana every 30 s — no restart needed |
[1.5.0] - 2026-05-19¶
Added¶
- PII & Injection Scanning (Aho-Corasick & Regex Fallback): Added prompt scanning for structural PII patterns and jailbreak/injection phrases.
- REST API Endpoint: Exposed
POST /v1/pii-injection/scanto allow remote scanning. - Contract-First Support:
- OpenAPI: Added
/pii-injection/scanpath definition to the OpenAPIv1.yamlcontract. - GraphQL: Added
scanPiiInjectionquery andPiiInjectionScanPayloadstructure tov1.graphql. - Protobuf: Added
ScanPiiInjectionRPC, request, and response message schemas toinstrumentation.proto.
[1.4.0] - 2026-05-18¶
Added¶
- All-in-One Standalone Telemetry & API Container: Bundled the FastAPI API server, Grafana, and Tempo inside a single, unified Docker image.
- Automatically provisions Tempo as a read-only trace datasource at container startup.
- Ephemeral block and WAL storage configured under
/tmp/tempoinside the container. - Orchestrates background Tempo, Grafana, and frontend Uvicorn processes seamlessly via
entrypoint.sh. - Pushed to Docker Hub registry as a production-ready image under the tag
chiefj/instrumentation-sdk-api:unstablefor one-command user deployment.
[1.3.0] - 2026-05-18¶
Added¶
- Streaming Observability (TTFT & Token Tracking): Implemented specialized utilities for tracking streaming LLM calls (both sync and async generators).
- Automatically captures Time-to-First-Token (TTFT) latency at the exact moment of the first yielded chunk.
- Accumulates yielded chunks and calculates completed completion tokens upon stream completion, close, or failure.
- Defers manual span finalization until the stream has completed, offering full resilience to early consumer aborts (
.close()and.aclose()). - REST API Endpoint: Exposed
POST /v1/streaming/test-stream-callto verify streaming and TTFT tracking. - Contract-First Support:
- GraphQL: Added
triggerTestStreamCallmutation. - Protobuf: Added
TriggerTestStreamCallRPC andTriggerTestStreamCallRequestmessage structure. - OpenAPI: Added
/streaming/test-stream-callpath definition tov1.yaml.
[1.2.0] - 2026-05-18¶
Added¶
- Token Counting: Implemented pre-call token counting utilizing
tiktokenwith fallback character-based heuristics. - Supports plain text string prompts, complex nested chat message lists, and OpenAI tile-based vision token calculation with pure-Python PNG/JPEG/GIF dimension parsing.
- Automatically records
prompt_tokensandtoken_count_methodinside manual spans viallm_span_with_tokens. - REST API Endpoint: Exposed
POST /v1/token-counting/countto allow remote token calculation. - Contract-First Support: Added
/token-counting/countpath definition to the OpenAPI v1.yaml contract.
Changed¶
- Public API Namespace: Exposed
count_tokensandllm_span_with_tokensdirectly at the package root level.
[1.1.0] - 2026-05-15¶
Added¶
- REST API Layer: Implemented a FastAPI-based management API for remote orchestration.
POST /instrumentation/init: Remotely enable auto-instrumentation.POST /instrumentation/uninstrument: Disable all active patchers.POST /instrumentation/detect: Dry-run detection of LLM providers from request samples.POST /instrumentation/test-call: End-to-end tracing verification.- Observability Integration: Injected OpenTelemetry (OTEL) middleware for standardized trace collection.
- Contract-First Support:
- Protobuf: Added
InstrumentationServicewithInit,Uninstrument,DetectProvider, andTriggerTestCallRPCs. - GraphQL: Added
initInstrumentation,uninstrument, anddetectProvidermutations. - AsyncAPI: Introduced
llm.instrumentation.eventschannel for state change notifications. - Docker Infrastructure:
- Production, Development, and Testing Dockerfiles with optimized layer caching.
- Multi-environment Docker Compose configurations (
dev,prod,test). - Automated contract and integration tests.
Changed¶
- Architecture: Migrated to a feature-isolated structure following Hexagonal Architecture principles.
- Middleware: Refactored
tracer_providerisolation to support side-effect-free testing.
Fixed¶
- Kafka Provisioning: Added missing
llm.instrumentation.eventstopic to the automated setup scripts and docker-compose configurations, resolving integration test failures. - API Initialization: Fixed a critical bug where the FastAPI
appinstance was not defined whenSKIP_APP_INITwas set, preventing the container from starting.
Security¶
- Added manual span attribute injection for protected services.
- Isolated test telemetry using
InMemorySpanExporterto prevent global state contamination.
[1.0.0] - 2026-05-10¶
- Initial release of the
instrumentation-sdk. - Core auto-instrumentation patchers for OpenAI, Anthropic, LiteLLM, and LangChain.
@llm_observedecorator andllm_spancontext manager.- Background worker for span enrichment and Cloudflare AI integration.