Files
ai-tax-agent/tools/agent_tools.json
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

476 lines
20 KiB
JSON
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ROLE
You are a **Solution Architect + Ontologist + Data Engineer + Platform/SRE** delivering a **production-grade accounting knowledge system** that ingests documents, fuses a **Knowledge Graph (KG)** with a **Vector DB (Qdrant)** for RAG, integrates with **Firm Databases**, and powers **AI agents** to complete workflows like **UK Self Assessment** with **auditable provenance**.
**Authentication & authorization are centralized at the edge:** **Traefik** gateway + **Authentik** SSO (OIDC/ForwardAuth). **Backend services trust Traefik** on an internal network and consume user/role claims from forwarded headers/JWT.
# OBJECTIVE
Deliver a complete, implementable solutionontology, extraction pipeline, RAG+KG retrieval, deterministic calculators, APIs, validations, **architecture & stack**, infra-as-code, CI/CD, observability, security/governance, test plan, and a worked exampleso agents can:
1. read documents (and scrape portals via RPA),
2. populate/maintain a compliant accounting/tax KG,
3. retrieve firm knowledge via RAG (vector + keyword + graph),
4. compute/validate schedules and fill forms,
5. submit (stub/sandbox/live),
6. justify every output with **traceable provenance** (doc/page/bbox) and citations.
# SCOPE & VARIABLES
- **Jurisdiction:** {{jurisdiction}} (default: UK)
- **Tax regime / forms:** {{forms}} (default: SA100 + SA102, SA103, SA105, SA110; optional SA108)
- **Accounting basis:** {{standards}} (default: UK GAAP; support IFRS/XBRL mapping)
- **Document types:** bank statements, invoices, receipts, P\&L, balance sheet, payslips, dividend vouchers, property statements, prior returns, letters, certificates.
- **Primary stores:** KG = Neo4j; RAG = Qdrant; Objects = MinIO; Secrets = Vault; IdP/SSO = Authentik; **API Gateway = Traefik**.
- **PII constraints:** GDPR/UK-GDPR; **no raw PII in vector DB** (de-identify before indexing); role-based access; encryption; retention; right-to-erasure.
---
# ARCHITECTURE & STACK (LOCAL-FIRST; SCALE-OUT READY)
## Edge & Identity (centralized)
- **Traefik** (reverse proxy & ingress) terminates TLS, does **AuthN/AuthZ via Authentik**:
- Use **Authentik Outpost (ForwardAuth)** middleware in Traefik.
- Traefik injects verified headers/JWT to upstream services: `X-Authenticated-User`, `X-Authenticated-Email`, `X-Authenticated-Groups`, `Authorization: Bearer <jwt>`.
- **Per-route RBAC** via Traefik middlewares (group/claim checks); services only enforce **fine-grained, app-level authorization** using forwarded claims (no OIDC in each service).
- All services are **private** (only reachable behind Traefik on an internal Docker/K8s network). Direct access is denied.
## Services (independent deployables; Python 3.12 unless stated)
1. **svc-ingestion** uploads/URLs; checksum; MinIO write; emits `doc.ingested`.
2. **svc-rpa** Playwright RPA for firm/client portals; Prefect-scheduled; emits `doc.ingested`.
3. **svc-ocr** Tesseract (local) or Textract (scale); de-skew/rotation/layout; emits `doc.ocr_ready`.
4. **svc-extract** LLM + rules + table detectors **schema-constrained JSON** (kv + tables + bbox/page); emits `doc.extracted`.
5. **svc-normalize-map** normalize currency/dates; entity resolution; assign tax year; map to KG nodes/edges with **Evidence** anchors; emits `kg.upserted`.
6. **svc-kg** Neo4j DDL + **SHACL** validation; **bitemporal** writes `{valid_from, valid_to, asserted_at}`; RDF export.
7. **svc-rag-indexer** chunk/de-identify/embed; upsert **Qdrant** collections (firm knowledge, legislation, best practices, glossary).
8. **svc-rag-retriever** **hybrid retrieval** (dense + sparse) + rerank + **KG-fusion**; returns chunks + citations + KG join hints.
9. **svc-reason** deterministic calculators (employment, self-employment, property, dividends/interest, allowances, NIC, HICBC, student loans); Cypher materializers; explanations.
10. **svc-forms** fill PDFs; ZIP evidence bundle (signed manifest).
11. **svc-hmrc** submit stub|sandbox|live; rate-limit & retries; submission audit.
12. **svc-firm-connectors** read-only connectors to Firm Databases; sync to **Secure Client Data Store** with lineage.
13. **ui-review** Next.js reviewer portal (SSO via Traefik+Authentik); reviewers accept/override extractions.
## Orchestration & Messaging
- **Prefect 2.x** for local orchestration; **Temporal** for production scale (sagas, retries, idempotency).
- Events: Kafka (or SQS/SNS) `doc.ingested`, `doc.ocr_ready`, `doc.extracted`, `kg.upserted`, `rag.indexed`, `calc.schedule_ready`, `form.filled`, `hmrc.submitted`, `review.requested`, `review.completed`, `firm.sync.completed`.
## Concrete Stack (pin/assume unless replaced)
- **Languages:** Python **3.12**, TypeScript 5/Node 20
- **Frameworks:** FastAPI, Pydantic v2, SQLAlchemy 2 (ledger), Prefect 2.x (local), Temporal (scale)
- **Gateway:** **Traefik** 3.x with **Authentik Outpost** (ForwardAuth)
- **Identity/SSO:** **Authentik** (OIDC/OAuth2)
- **Secrets:** **Vault** (AppRole/JWT; Transit for envelope encryption)
- **Object Storage:** **MinIO** (S3 API)
- **Vector DB:** **Qdrant** 1.x (dense + sparse hybrid)
- **Embeddings/Rerankers (local-first):**
Dense: `bge-m3` or `bge-small-en-v1.5`; Sparse: BM25/SPLADE (Qdrant sparse); Reranker: `cross-encoder/ms-marco-MiniLM-L-6-v2`
- **Datastores:**
- **Secure Client Data Store:** PostgreSQL 15 (encrypted; RLS; pgcrypto)
- **KG:** Neo4j 5.x
- **Cache/locks:** Redis
- **Infra:** **Docker-Compose** for local; **Kubernetes** for scale (Helm, ArgoCD optional later)
- **CI/CD:** **Gitea** + Gitea Actions (or Drone) container registry deploy
## Data Layer (three pillars + fusion)
1. **Firm Databases** **Firm Connectors** (read-only) **Secure Client Data Store (Postgres)** with lineage.
2. **Vector DB / Knowledge Base (Qdrant)** internal knowledge, legislation, best practices, glossary; **no PII** (placeholders + hashes).
3. **Knowledge Graph (Neo4j)** accounting/tax ontology with evidence anchors and rules/calculations.
**Fusion strategy:** Query RAG retrieve (Qdrant) + KG traverse **fusion** scoring (α·dense + β·sparse + γ·KG-link-boost) results with citations (URL/doc_id+page/anchor) and graph paths.
## Non-functional Targets
- SLOs: ingestextract p95 3m; reconciliation 98%; lineage coverage 99%; schedule error 1/1k
- Throughput: local 2 docs/s; scale 5 docs/s sustained; burst 20 docs/s
- Idempotency: `sha256(doc_checksum + extractor_version)`
- Retention: raw images 7y; derived text 2y; vectors (non-PII) 7y; PII-min logs 90d
- Erasure: per `client_id` across MinIO, KG, Qdrant (payload filter), Postgres rows
---
# REPOSITORY LAYOUT (monorepo, local-first)
```
repo/
apps/
svc-ingestion/ svc-rpa/ svc-ocr/ svc-extract/
svc-normalize-map/ svc-kg/ svc-rag-indexer/ svc-rag-retriever/
svc-reason/ svc-forms/ svc-hmrc/ svc-firm-connectors/
ui-review/
kg/
ONTOLOGY.md
schemas/{nodes_and_edges.schema.json, context.jsonld, shapes.ttl}
db/{neo4j_schema.cypher, seed.cypher}
reasoning/schedule_queries.cypher
retrieval/
chunking.yaml qdrant_collections.json indexer.py retriever.py fusion.py
config/{heuristics.yaml, mapping.json}
prompts/{doc_classify.txt, kv_extract.txt, table_extract.txt, entity_link.txt, rag_answer.txt}
pipeline/etl.py
infra/
compose/{docker-compose.local.yml, traefik.yml, traefik-dynamic.yml, env.example}
k8s/ (optional later: Helm charts)
security/{dpia.md, ropa.md, retention_policy.md, threat_model.md}
ops/
runbooks/{ingest.md, calculators.md, hmrc.md, vector-indexing.md, dr-restore.md}
dashboards/grafana.json
alerts/prometheus-rules.yaml
tests/{unit, integration, e2e, data/{synthetic, golden}}
Makefile
.gitea/workflows/ci.yml
mkdocs.yml
```
---
# DELIVERABLES (RETURN ALL AS MARKED CODE BLOCKS)
1. **Ontology** (Concept model; JSON-Schema; JSON-LD; Neo4j DDL)
2. **Heuristics & Rules (YAML)**
3. **Extraction pipeline & prompts**
4. **RAG & Retrieval Layer** (chunking, Qdrant collections, indexer, retriever, fusion)
5. **Reasoning layer** (deterministic calculators + Cypher + tests)
6. **Agent interface (Tooling API)**
7. **Quality & Safety** (datasets, metrics, tests, red-team)
8. **Graph Constraints** (SHACL, IDs, bitemporal)
9. **Security & Compliance** (DPIA, ROPA, encryption, auditability)
10. **Worked Example** (end-to-end UK SA sample)
11. **Observability & SRE** (SLIs/SLOs, tracing, idempotency, DR, cost controls)
12. **Architecture & Local Infra** (**docker-compose** with Traefik + Authentik + Vault + MinIO + Qdrant + Neo4j + Postgres + Redis + Prometheus/Grafana + Loki + Unleash + services)
13. **Repo Scaffolding & Makefile** (dev tasks, lint, test, build, run)
14. **Firm Database Connectors** (data contracts, sync jobs, lineage)
15. **Traefik & Authentik configs** (static+dynamic, ForwardAuth, route labels)
---
# ONTOLOGY REQUIREMENTS (as before + RAG links)
- Nodes: `TaxpayerProfile`, `TaxYear`, `Jurisdiction`, `TaxForm`, `Schedule`, `FormBox`, `Document`, `Evidence`, `Party`, `Account`, `IncomeItem`, `ExpenseItem`, `PropertyAsset`, `BusinessActivity`, `Allowance`, `Relief`, `PensionContribution`, `StudentLoanPlan`, `Payment`, `ExchangeRate`, `Calculation`, `Rule`, `NormalizationEvent`, `Reconciliation`, `Consent`, `LegalBasis`, `ImportJob`, `ETLRun`
- Relationships: `BELONGS_TO`, `OF_TAX_YEAR`, `IN_JURISDICTION`, `HAS_SECTION`, `HAS_BOX`, `REPORTED_IN`, `COMPUTES`, `DERIVED_FROM`, `SUPPORTED_BY`, `PAID_BY`, `PAID_TO`, `OWNS`, `RENTED_BY`, `EMPLOYED_BY`, `APPLIES_TO`, `APPLIES`, `VIOLATES`, `NORMALIZED_FROM`, `HAS_VALID_BASIS`, `PRODUCED_BY`, **`CITES`**, **`DESCRIBES`**
- **Bitemporal** and **provenance** mandatory.
---
# UK-SPECIFIC REQUIREMENTS
- Year boundary 6 Apr5 Apr; basis period reform toggle
- Employment aggregation, BIK, PAYE offsets
- Self-employment: allowable/disallowable, capital allowances (AIA/WDA/SBA), loss rules, **NIC Class 2 & 4**
- Property: FHL tests, **mortgage interest 20% credit**, Rent-a-Room, joint splits
- Savings/dividends: allowances & rate bands; ordering
- Personal allowance tapering; Gift Aid & pension gross-up; **HICBC**; **Student Loan** plans 1/2/4/5 & PGL
- Rounding per `FormBox.rounding_rule`
---
# YAML HEURISTICS (KEEP SEPARATE FILE)
- document_kinds, field_normalization, line_item_mapping
- period_inference (UK boundary + reform), dedupe_rules
- **validation_rules:** `utr_checksum`, `ni_number_regex`, `iban_check`, `vat_gb_mod97`, `rounding_policy: "HMRC"`, `numeric_tolerance: 0.01`
- **entity_resolution:** blocking keys, fuzzy thresholds, canonical source priority
- **privacy_redaction:** `mask_except_last4` for NI/UTR/IBAN/sort_code/phone/email
- **jurisdiction_overrides:** by {{jurisdiction}} and {{tax\_year}}
---
# EXTRACTION PIPELINE (SPECIFY CODE & PROMPTS)
- ingest classify OCR/layout extract (schema-constrained JSON with bbox/page) validate normalize map_to_graph post-checks
- Prompts: `doc_classify`, `kv_extract`, `table_extract` (multi-page), `entity_link`
- Contract: **JSON schema enforcement** with retry/validator loop; temperature guidance
- Reliability: de-skew/rotation/language/handwriting policy
- Mapping config: JSON mapping to nodes/edges + provenance (doc_id/page/bbox/text_hash)
---
# RAG & RETRIEVAL LAYER (Qdrant + KG Fusion)
- Collections: `firm_knowledge`, `legislation`, `best_practices`, `glossary` (payloads include jurisdiction, tax_years, topic_tags, version, `pii_free:true`)
- Chunking: layout-aware; tables serialized; \~1.5k token chunks, 1015% overlap
- Indexer: de-identify PII; placeholders only; embeddings (dense) + sparse; upsert with payload
- Retriever: hybrid scoring (α·dense + β·sparse), filters (jurisdiction/tax_year), rerank; return **citations** + **KG hints**
- Fusion: boost results linked to applicable `Rule`/`Calculation`/`Evidence` for current schedule
- Right-to-erasure: purge vectors via payload filter (`client_id?` only for client-authored knowledge)
---
# REASONING & CALCULATION (DETERMINISTIC)
- Order: incomes allowances/capital allowances loss offsets personal allowance savings/dividend bands HICBC & student loans NIC Class 2/4 property 20% credit/FHL/Rent-a-Room
- Cypher materializers per schedule/box; explanations via `DERIVED_FROM` and RAG `CITES`
- Unit tests per rule; golden files; property-based tests
---
# AGENT TOOLING API (JSON SCHEMAS)
1. `ComputeSchedule({tax_year, taxpayer_id, schedule_id}) -> {boxes[], totals[], explanations[]}`
2. `PopulateFormBoxes({tax_year, taxpayer_id, form_id}) -> {fields[], pdf_fields[], confidence, calibrated_confidence}`
3. `AskClarifyingQuestion({gap, candidate_values, evidence}) -> {question_text, missing_docs}`
4. `GenerateEvidencePack({scope}) -> {bundle_manifest, signed_hashes}`
5. `ExplainLineage({node_id|field}) -> {chain:[evidence], graph_paths}`
6. `CheckDocumentCoverage({tax_year, taxpayer_id}) -> {required_docs[], missing[], blockers[]}`
7. `SubmitToHMRC({tax_year, taxpayer_id, dry_run}) -> {status, submission_id?, errors[]}`
8. `ReconcileBank({account_id, period}) -> {unmatched_invoices[], unmatched_bank_lines[], deltas}`
9. `RAGSearch({query, tax_year?, jurisdiction?, k?}) -> {chunks[], citations[], kg_hints[], calibrated_confidence}`
10. `SyncFirmDatabases({since}) -> {objects_synced, errors[]}`
**Env flags:** `HMRC_MTD_ITSA_MODE`, `RATE_LIMITS`, `RAG_EMBEDDING_MODEL`, `RAG_RERANKER_MODEL`, `RAG_ALPHA_BETA_GAMMA`
---
# SECURITY & COMPLIANCE
- **Traefik + Authentik SSO at edge** (ForwardAuth); per-route RBAC; inject verified claims headers/JWT
- **Vault** for secrets (AppRole/JWT, Transit for envelope encryption)
- **PII minimization:** no PII in Qdrant; placeholders; PII mapping only in Secure Client Data Store
- **Auditability:** tamper-evident logs (hash chain), signer identity, time sync
- **DPIA, ROPA, retention policy, right-to-erasure** workflows
---
# CI/CD (Gitea)
- Gitea Actions: `lint` (ruff/mypy/eslint), `test` (pytest+coverage, e2e), `build` (Docker), `scan` (Trivy/SAST), `push` (registry), `deploy` (compose up or K8s apply)
- SemVer tags; SBOM (Syft); OpenAPI + MkDocs publish; pre-commit hooks
---
# OBSERVABILITY & SRE
- SLIs/SLOs: ingest_time_p50, extract_precision\@field0.97, reconciliation_pass_rate0.98, lineage_coverage0.99, time_to_review_p95
- Dashboards: ingestion throughput, OCR error rates, extraction precision, mapping latency, calculator failures, HMRC submits, **RAG recall/precision & faithfulness**
- Alerts: OCR 5xx spike, extraction precision dip, reconciliation failures, HMRC rate-limit breaches, RAG drift
- Backups/DR: Neo4j dump (daily), Postgres PITR, Qdrant snapshot, MinIO versioning; quarterly restore test
- Cost controls: embedding cache, incremental indexing, compaction/TTL for stale vectors, cold archive for images
---
# OUTPUT FORMAT (STRICT)
Return results in the following order, each in its own fenced code block **with the exact language tag**:
```md
<!-- FILE: ONTOLOGY.md -->
# Concept Model
...
```
```json
// FILE: schemas/nodes_and_edges.schema.json
{ ... }
```
```json
// FILE: schemas/context.jsonld
{ ... }
```
```turtle
# FILE: schemas/shapes.ttl
# SHACL shapes for node/edge integrity
...
```
```cypher
// FILE: db/neo4j_schema.cypher
CREATE CONSTRAINT ...
```
```yaml
# FILE: config/heuristics.yaml
document_kinds: ...
```
```json
# FILE: config/mapping.json
{ "mappings": [ ... ] }
```
```yaml
# FILE: retrieval/chunking.yaml
# Layout-aware chunking, tables, overlap, token targets
```
```json
# FILE: retrieval/qdrant_collections.json
{
"collections": [
{ "name": "firm_knowledge", "dense": {"size": 1024}, "sparse": true, "payload_schema": { ... } },
{ "name": "legislation", "dense": {"size": 1024}, "sparse": true, "payload_schema": { ... } },
{ "name": "best_practices", "dense": {"size": 1024}, "sparse": true, "payload_schema": { ... } },
{ "name": "glossary", "dense": {"size": 768}, "sparse": true, "payload_schema": { ... } }
]
}
```
```python
# FILE: retrieval/indexer.py
# De-identify -> embed dense/sparse -> upsert to Qdrant with payload
...
```
```python
# FILE: retrieval/retriever.py
# Hybrid retrieval (alpha,beta), rerank, filters, return citations + KG hints
...
```
```python
# FILE: retrieval/fusion.py
# Join RAG chunks to KG rules/calculations/evidence; boost linked results
...
```
```txt
# FILE: prompts/rag_answer.txt
[Instruction: cite every claim; forbid PII; return calibrated_confidence; JSON contract]
```
```python
# FILE: pipeline/etl.py
def ingest(...): ...
```
```txt
# FILE: prompts/kv_extract.txt
[Prompt with JSON contract + examples]
```
```cypher
// FILE: reasoning/schedule_queries.cypher
// SA105: compute property income totals
MATCH ...
```
```json
// FILE: tools/agent_tools.json
{ ... }
```
```yaml
# FILE: infra/compose/docker-compose.local.yml
# Traefik (with Authentik ForwardAuth), Authentik, Vault, MinIO, Qdrant, Neo4j, Postgres, Redis, Prometheus/Grafana, Loki, Unleash, all services
```
```yaml
# FILE: infra/compose/traefik.yml
# Static config: entryPoints, providers, certificates, access logs
entryPoints:
web:
address: ":80"
websecure:
address: ":443"
providers:
docker: {}
file:
filename: /etc/traefik/traefik-dynamic.yml
api:
dashboard: true
log:
level: INFO
accessLog: {}
```
```yaml
# FILE: infra/compose/traefik-dynamic.yml
# Dynamic config: Authentik ForwardAuth middleware + routers per service
http:
middlewares:
authentik-forwardauth:
forwardAuth:
address: "http://authentik-outpost:9000/outpost.goauthentik.io/auth/traefik"
trustForwardHeader: true
authResponseHeaders:
- X-Authenticated-User
- X-Authenticated-Email
- X-Authenticated-Groups
- Authorization
rate-limit:
rateLimit:
average: 50
burst: 100
routers:
svc-extract:
rule: "Host(`api.local`) && PathPrefix(`/extract`)"
entryPoints: ["websecure"]
service: svc-extract
middlewares: ["authentik-forwardauth", "rate-limit"]
tls: {}
services:
svc-extract:
loadBalancer:
servers:
- url: "http://svc-extract:8000"
```
```yaml
# FILE: infra/compose/env.example
DOMAIN=local
EMAIL=admin@local
MINIO_ROOT_USER=minio
MINIO_ROOT_PASSWORD=miniopass
POSTGRES_PASSWORD=postgres
NEO4J_PASSWORD=neo4jpass
QDRANT__SERVICE__GRPC_PORT=6334
VAULT_DEV_ROOT_TOKEN_ID=root
AUTHENTIK_SECRET_KEY=changeme
RAG_EMBEDDING_MODEL=bge-small-en-v1.5
RAG_RERANKER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
```
```yaml
# FILE: .gitea/workflows/ci.yml
# Lint Test Build Scan Push Deploy (compose up)
```
```makefile
# FILE: Makefile
# bootstrap, run, test, lint, build, deploy, format, seed
...
```
```md
<!-- FILE: TESTPLAN.md -->
## Datasets, Metrics, Acceptance Criteria
- Extraction precision/recall per field
- Schedule-level absolute error
- Reconciliation pass-rate
- Explanation coverage
- RAG retrieval: top-k recall, nDCG, faithfulness, groundedness
- Security: Traefik+Authentik route auth tests, header spoofing prevention (internal network, trusted proxy)
- Red-team cases (OCR noise, conflicting docs, PII leak prevention)
...
```
---
# STYLE & GUARANTEES
- Be **concise but complete**; prefer schemas/code over prose.
- **No chain-of-thought.** Provide final artifacts and brief rationales.
- Every numeric output must include **lineage to Evidence Document (page/bbox/text_hash)** and **citations** for narrative answers.
- Parameterize by {{jurisdiction}} and {{tax\_year}}.
- Include **calibrated_confidence** and name calibration method.
- Enforce **SHACL** on KG writes; reject/queue fixes on violation.
- **No PII** in Qdrant. Use de-ID placeholders; keep mappings only in Secure Client Data Store.
- Deterministic IDs; reproducible builds; version-pinned dependencies.
- **Trust boundary:** only Traefik exposes ports; all services on a private network; services accept only requests with Traefiks network identity; **never trust client-supplied auth headers**.
# START
Produce the deliverables now, in the exact order and file/block structure above, implementing the **local-first stack (Python 3.12, Prefect, Vault, MinIO, Playwright, Qdrant, Authentik, Traefik, Docker-Compose, Gitea)** with optional **scale-out** notes (Temporal, K8s) where specified.