Architecture & Internals
This project uses a service-oriented layout with ports-and-adapters-inspired seams. Some boundaries are formalized through abstractions such as BaseLLM, while others are still concrete implementations that can be refactored further as the template matures.
The goal is practical separation rather than architecture theater:
- Request handling stays in FastAPI routes and middleware.
- Business logic lives in services and domain models.
- External integrations sit behind infrastructure adapters or provider abstractions.
- Reliability concerns such as tracing, logging, rate limiting, and circuit breaking are treated as part of the application design, not bolt-ons.
📐 System Diagram
graph TD
subgraph "External World"
Client[Client Request]
Jaeger[Jaeger Tracing]
Prom[Prometheus]
end
subgraph "Infrastructure Adapters (Driving)"
API["FastAPI Entrypoint<br/>(src.main)"]
end
subgraph "Core Application (The Domain)"
direction TB
Service["Auth Service<br/>(src.services.auth_service)"]
Domain["Domain Models<br/>(src.domain.models)"]
end
subgraph "Infrastructure Adapters (Driven)"
Repo["User Repository<br/>(src.infrastructure.user_repository)"]
LLM["AI Adapter<br/>(src.core.llm/)"]
Logger["Structlog Adapter<br/>(src.core.logging)"]
HTTP["Instrumented HTTP Client<br/>(src.infrastructure.http_client)"]
end
subgraph "Observability Sidecars"
OTel[OpenTelemetry Collector]
Metrics[Metrics Instrumentator]
end
Client --> API
API --> Service
Service --> Domain
Service --> Repo
API -.-> OTel
API -.-> Metrics
LLM -.-> API
HTTP -.-> OTel
OTel --> Jaeger
Metrics --> Prom
Design Principles
Service Layer + Adapter Seams
We decouple the core decision-making paths from the external systems they depend on.
Why it helps:
- Testability: Services and helper modules can be exercised without booting the full observability stack.
- Provider Switching: Moving between supported LLM providers is mostly a configuration and adapter concern.
- Refactor Path: More boundaries can be promoted into explicit protocols or ports as the template grows.
Reliability as a First-Class Concern
Observability and failure handling are part of the runtime design, not just deployment garnish.
Current examples in the codebase:
- Distributed Tracing: Requests carry correlation metadata and export spans when
OTLP_ENDPOINTis configured. - Circuit Breaker: The demo upstream path fails fast after repeated errors and returns a degraded fallback response.
- AI Log Triage: The summarizer reads filtered local error logs and produces a first-pass explanation plus remediation hints.
🛠️ The Codebase Explained
| Path | Feature | Why it matters |
|---|---|---|
src/core/middleware.py |
Correlation ID | Injects a unique X-Correlation-ID into every request for distributed tracing. |
src/core/config.py |
Fail-Fast Settings | Uses Pydantic v2 strict mode to validate environment variables on startup. |
src/core/circuit_breaker.py |
Fault Tolerance | Implementation of the Circuit Breaker pattern with state management (Open, Closed, Half-Open). |
src/core/rate_limit.py |
Fixed-Window Rate Limiting | SlowAPI + limits with configurable storage backend. |
src/services/ |
Service Layer | Business logic orchestration that stays separate from request handling. |
src/domain/models.py |
Data Integrity | Pydantic models ensuring "Garbage In" never reaches our core logic. |
⚡ Resilience Implementation
Circuit Breakers
Our implementation tracks consecutive failures. - Trip Condition: 5 consecutive failures. - Cooldown: 60 seconds. - Metric: Each state change is exported as a Prometheus metric for real-time monitoring.
Trace Propagation
We provide an instrumented HTTP client in src.infrastructure.http_client that propagates trace context and correlation IDs for outbound calls.
ADRs
For high-level design rationale, refer to the Decision Log (ADRs).