14 KiB
ROLE
You are a Senior Backend Engineer working inside an existing monorepo that already contains the services and libraries described previously (Traefik+Authentik SSO at the edge; Python 3.12; FastAPI microservices; Vault, MinIO, Neo4j, Postgres, Redis, Qdrant; Prefect; Docker-Compose; Gitea CI).
OBJECTIVE
Integrate the new coverage policy (config/coverage.yaml) so agents can:
- call
CheckDocumentCoverage({tax_year, taxpayer_id})and get a precise, machine-readable coverage matrix (required/conditional/optional evidence per schedule, with status and citations), and - call
AskClarifyingQuestion(gap, context)to receive a ready-to-send user question with why and citations.
You will implement policy loading with overlays + hot reload, runtime evaluation against the KG, citations via KG or RAG, validation, tests, CI, and deploy assets.
SCOPE (DO EXACTLY THIS)
A) New service: svc-coverage
Create a dedicated microservice to encapsulate policy loading and coverage evaluation (keeps svc-reason calculators clean).
Endpoints (FastAPI):
-
POST /v1/coverage/check- Body:
{"tax_year": "YYYY-YY", "taxpayer_id": "T-xxx"} - Returns: full coverage report (shape below).
- Body:
-
POST /v1/coverage/clarify- Body:
{"gap": {...}, "context": {"tax_year": "...", "taxpayer_id": "...", "jurisdiction": "UK"}} - Returns:
{question_text, why_it_is_needed, citations[], options_to_provide[], blocking, boxes_affected[]}.
- Body:
-
POST /admin/coverage/reload- Reloads policy from files/overrides/feature flags. Require admin group via forwarded header.
-
GET /v1/coverage/policy- Returns current compiled policy (no secrets, no PII), with version & sources.
-
GET /v1/coverage/validate- Runs cross-checks (see Validation section). Returns
{ok: bool, errors[]}.
- Runs cross-checks (see Validation section). Returns
Security:
- All routes behind Traefik+Authentik.
/admin/*additionally checksX-Authenticated-Groupscontainsadmin.- Use the existing
TrustedProxyMiddleware.
Observability:
- OTel tracing, Prometheus metrics at
/metrics(internal CIDR only), structured logs.
B) Libraries & shared code (create/update)
libs/policy.py(new)
-
Functions:
load_policy(baseline_path, jurisdiction, tax_year, tenant_id|None) -> CoveragePolicymerge_overlays(base, *overlays) -> CoveragePolicyapply_feature_flags(policy) -> CoveragePolicy(optional Unleash)compile_predicates(policy) -> CompiledCoveragePolicy(turncondition:DSL into callables; see DSL below)watch_and_reload()(optional watchdog; otherwise/admin/coverage/reload)
-
Validate against JSON Schema (below). Raise
PolicyErroron failure.
libs/coverage_models.py(new)
- Pydantic v2 models mirroring
config/coverage.yaml:CoveragePolicy, SchedulePolicy, EvidenceItem, Validity, StatusClassifier, QuestionTemplates, ConflictRules, GuidanceRef, Trigger, CoverageReport, CoverageItem, Citation, ClarifyResponse. - Enums:
Role = REQUIRED|CONDITIONALLY_REQUIRED|OPTIONAL,Status = present_verified|present_unverified|missing|conflicting.
libs/coverage_eval.py(new)
-
Core runtime:
infer_required_schedules(taxpayer_id, tax_year, policy, kg) -> list[str]find_evidence_docs(taxpayer_id, tax_year, evidence_ids, thresholds, kg) -> list[FoundEvidence]classify_status(found, thresholds, tax_year_bounds, conflicts_rules) -> Statusbuild_reason_and_citations(schedule_id, evidence_item, status, taxpayer_id, tax_year, kg, rag) -> (str, list[Citation])check_document_coverage(...) -> CoverageReport(implements the A→D steps we defined)
-
Uses:
libs/neo.pyfor Cypher helpers (see queries below)libs/rag.pyfor fallback citations (filters{jurisdiction:'UK', tax_year}andpii_free:true)
libs/coverage_schema.json(new)
-
JSON Schema for validating
coverage.yaml. Include:- enum checks (
role,status keys) boxes[]is non-empty strings- every
evidence.idpresent indocument_kindsoracceptable_alternativespoints to a declared kind triggersexist for each schedule referenced underschedules
- enum checks (
libs/neo.py(update)
-
Add helpers:
kg_boxes_exist(box_ids: list[str]) -> dict[str,bool]kg_find_evidence(taxpayer_id, tax_year, kinds: list[str], min_ocr: float, date_window) -> list[FoundEvidence]kg_rule_citations(schedule_id, boxes: list[str]) -> list[Citation]
libs/rag.py(update)
- Add
rag_search_for_citations(query, filters) -> list[Citation](ensurepii_free:trueand includedoc_id/url, locator).
C) Coverage DSL for conditions (compile in compile_predicates)
Supported condition atoms (map to KG checks):
exists(Entity[filters])e.g.,exists(ExpenseItem[category='FinanceCosts'])property_joint_ownership(bool from KGPropertyAssetlinks)candidate_FHL(bool property onPropertyAssetor derived)claims_FTCR,claims_remittance_basis(flags onTaxpayerProfile)turnover_lt_vat_threshold/turnover_ge_vat_threshold(computed fromIncomeItemaggregates)received_estate_income,BenefitInKind=true, etc.
Implementation: parse simple strings with a tiny hand-rolled parser or declarative mapping table; do not eval raw strings. Return callables fn(taxpayer_id, tax_year) -> bool.
D) Database migrations (Postgres; Alembic)
Create two tables (new apps/svc-coverage/alembic):
-
coverage_versionsid(serial pk),version(text),jurisdiction(text),tax_year(text),tenant_id(text null),source_files(jsonb),compiled_at(timestamptz),hash(text)
-
coverage_auditid(serial pk),taxpayer_id(text),tax_year(text),policy_version(text),overall_status(text),blocking_items(jsonb),created_at(timestamptz),trace_id(text)
Write to coverage_versions on reload; write to coverage_audit on each /v1/coverage/check.
E) API Contracts (exact shapes)
1) /v1/coverage/check (request)
{ "tax_year": "2024-25", "taxpayer_id": "T-001" }
1) /v1/coverage/check (response)
{
"tax_year": "2024-25",
"taxpayer_id": "T-001",
"schedules_required": ["SA102", "SA105", "SA110"],
"overall_status": "blocking", // ok | partial | blocking
"coverage": [
{
"schedule_id": "SA102",
"status": "partial",
"evidence": [
{
"id": "P60",
"role": "REQUIRED",
"status": "present_unverified",
"boxes": ["SA102_b1", "SA102_b2"],
"found": [
{
"doc_id": "DOC-123",
"kind": "P60",
"confidence": 0.81,
"pages": [2]
}
],
"acceptable_alternatives": ["FinalPayslipYTD", "P45"],
"reason": "P60 present but OCR confidence 0.81 < 0.82 threshold.",
"citations": [
{
"rule_id": "UK.SA102.P60.Required",
"doc_id": "SA102-Notes-2025",
"locator": "p.3 §1.1"
}
]
}
]
}
],
"blocking_items": [
{ "schedule_id": "SA105", "evidence_id": "LettingAgentStatements" }
]
}
2) /v1/coverage/clarify (request)
{
"gap": {
"schedule_id": "SA105",
"evidence_id": "LettingAgentStatements",
"role": "REQUIRED",
"reason": "No rent/fees statements for 2024–25.",
"boxes": ["SA105_b5", "SA105_b20", "SA105_b29"],
"citations": [
{
"rule_id": "UK.SA105.RentEvidence",
"doc_id": "SA105-Notes-2025",
"locator": "p.4 §2.1"
}
],
"acceptable_alternatives": ["TenancyLedger", "BankStatements"]
},
"context": {
"tax_year": "2024-25",
"taxpayer_id": "T-001",
"jurisdiction": "UK"
}
}
2) /v1/coverage/clarify (response)
{
"question_text": "To complete the UK Property pages (SA105) for 2024–25, we need your letting agent statements showing total rents received, fees and charges. These support boxes SA105:5, SA105:20 and SA105:29. If you don’t have agent statements, you can provide a tenancy income ledger instead.",
"why_it_is_needed": "HMRC guidance requires evidence of gross rents and allowable expenses for SA105 (see notes p.4 §2.1).",
"citations": [
{
"rule_id": "UK.SA105.RentEvidence",
"doc_id": "SA105-Notes-2025",
"locator": "p.4 §2.1"
}
],
"options_to_provide": [
{
"label": "Upload agent statements (PDF/CSV)",
"accepted_formats": ["pdf", "csv"],
"upload_endpoint": "/v1/ingest/upload?tag=LettingAgentStatements"
},
{
"label": "Upload tenancy income ledger (XLSX/CSV)",
"accepted_formats": ["xlsx", "csv"],
"upload_endpoint": "/v1/ingest/upload?tag=TenancyLedger"
}
],
"blocking": true,
"boxes_affected": ["SA105_b5", "SA105_b20", "SA105_b29"]
}
F) KG & RAG integration (implement exactly)
Neo4j Cypher helpers (in libs/neo.py)
- Presence of evidence
MATCH (p:TaxpayerProfile {taxpayer_id:$tid})-[:OF_TAX_YEAR]->(y:TaxYear {label:$tax_year})
MATCH (ev:Evidence)-[:DERIVED_FROM]->(d:Document)
WHERE (ev)-[:SUPPORTS]->(p) OR (d)-[:BELONGS_TO]->(p)
AND d.kind IN $kinds
AND date(d.date) >= date(y.start_date) AND date(d.date) <= date(y.end_date)
RETURN d.doc_id AS doc_id, d.kind AS kind, ev.page AS page, ev.bbox AS bbox, ev.ocr_confidence AS conf;
- Rule citations for schedule/boxes
MATCH (fb:FormBox)-[:GOVERNED_BY]->(r:Rule)-[:CITES]->(doc:Document)
WHERE fb.box_id IN $box_ids
RETURN r.rule_id AS rule_id, doc.doc_id AS doc_id, doc.locator AS locator LIMIT 10;
- Check boxes exist
UNWIND $box_ids AS bid
OPTIONAL MATCH (fb:FormBox {box_id: bid})
RETURN bid, fb IS NOT NULL AS exists;
RAG fallback (in libs/rag.py)
-
rag_search_for_citations(query, filters={'jurisdiction':'UK','tax_year':'2024-25','pii_free':true}) -> list[Citation]- Use Qdrant hybrid search + rerank; return doc_id/url and a best-effort locator (heading/page).
G) Validation & policy correctness
Implement /v1/coverage/validate to run checks:
- YAML schema (
libs/coverage_schema.json) passes. - Every
boxes[]exists in KG (FormBox). - Every
evidence.idand eachacceptable_alternatives[]is indocument_kinds. - Every schedule referenced under
scheduleshas atriggersentry. - Simulate a set of synthetic profiles (unit fixtures) to ensure conditional paths are exercised (e.g., with/without BIK, FHL candidate, remittance).
Return {ok: true} or {ok:false, errors:[...]}.
H) Config loading, overlays & hot reload
Load order:
config/coverage.yaml(baseline)config/coverage.{jurisdiction}.{tax_year}.yaml(if present)config/overrides/{tenant_id}.yaml(if present)- Apply feature flags (if Unleash present)
- Compile predicates; compute hash of concatenated files.
Expose /admin/coverage/reload to recompile; write an entry in coverage_versions.
I) Compose & Traefik
Add container svc-coverage to infra/compose/docker-compose.local.yml:
- Port
8000, labels:
- "traefik.enable=true"
- "traefik.http.routers.svc-coverage.rule=Host(`api.local`) && PathPrefix(`/coverage`)"
- "traefik.http.routers.svc-coverage.entrypoints=websecure"
- "traefik.http.routers.svc-coverage.tls=true"
- "traefik.http.routers.svc-coverage.middlewares=authentik-forwardauth,rate-limit"
- "traefik.http.services.svc-coverage.loadbalancer.server.port=8000"
- Mount
./config:/app/config:roso policy can be hot-reloaded.
J) CI (Gitea) additions
-
Add a job
policy-validatethat runs:yamllint config/coverage.yaml- Policy JSON Schema validation
- Box existence check (calls a local Neo4j with seeded
FormBoxregistry or mocks via snapshot)
-
Make pipeline fail if any validation fails.
-
Ensure unit/integration tests for
svc-coveragepush coverage ≥ 90%.
K) Tests (create all)
-
Unit (
tests/unit/coverage/):test_policy_load_and_merge.pytest_predicate_compilation.py(conditions DSL)test_status_classifier.py(present_verified/unverified/missing/conflicting)test_question_templates.py(string assembly, alternatives)
-
Integration (
tests/integration/coverage/):- Spin up Neo4j with fixtures (seed form boxes + minimal rules/docs).
test_check_document_coverage_happy_path.pytest_check_document_coverage_blocking_gaps.pytest_clarify_generates_citations_kg_then_rag.py(mock RAG)
-
E2E (
tests/e2e/test_coverage_to_compute_flow.py):- Ingest → OCR → Extract (mock) → Map →
/coverage/check(expect blocking) →/coverage/clarify→ upload alt doc →/coverage/checknow ok → compute schedule.
- Ingest → OCR → Extract (mock) → Map →
L) Error handling & codes
-
Use RFC7807 Problem+JSON; standardize types:
/errors/policy-invalid,/errors/policy-reload-failed,/errors/kg-query-failed,/errors/rag-citation-failed
-
Include
trace_idin all errors; log withwarn/errorand span attributes{taxpayer_id, tax_year, schedule}.
M) Acceptance criteria (DoD)
docker compose upbrings upsvc-coverage.POST /v1/coverage/checkreturns correct overall_status and blocking_items for synthetic fixtures./v1/coverage/clarifyreturns a polite, specific question with boxes listed, upload endpoints, and citations./admin/coverage/reloadpicks up edited YAML without restart and logs a newcoverage_versionsrow./v1/coverage/validatereturns{ok:true}on the provided policy; CI fails if not.- No PII enters RAG queries (enforce
pii_free:truefilter). - Coverage ≥ 90% on
svc-coverage; policy validation job green.
OUTPUT (FILES TO CREATE/UPDATE)
Generate the following files with production-quality code and docs:
libs/policy.py
libs/coverage_models.py
libs/coverage_schema.json
libs/coverage_eval.py
libs/neo.py # update with helpers shown
libs/rag.py # update with citation search
apps/svc-coverage/main.py
apps/svc-coverage/alembic/versions/*.py
infra/compose/docker-compose.local.yml # add service & volume
.gitea/workflows/ci.yml # add policy-validate job
tests/unit/coverage/*.py
tests/integration/coverage/*.py
tests/e2e/test_coverage_to_compute_flow.py
README.md # add section: Coverage Policy & Hot Reload
Use the policy file at config/coverage.yaml we already drafted. Do not change its content; only read and validate it.
START
Proceed to implement and output the listed files in the order above.