full ingestion -> OCR -> extraction flow is now working correctly.
Some checks failed
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled

This commit is contained in:
harkon
2025-11-26 15:46:59 +00:00
parent fdba81809f
commit db61b05c80
17 changed files with 170 additions and 553 deletions

View File

@@ -64,28 +64,6 @@ Return a JSON object with the extracted fields and confidence scores.
"""
# Create app and settings
app, settings = create_app(
service_name="svc-extract",
title="Tax Agent Extraction Service",
description="LLM-based field extraction service",
settings_class=ExtractionSettings,
)
# Add middleware
middleware_factory = create_trusted_proxy_middleware(settings.internal_cidrs)
app.add_middleware(middleware_factory)
# Global clients
storage_client: StorageClient | None = None
document_storage: DocumentStorage | None = None
event_bus: EventBus | None = None
confidence_calibrator: ConfidenceCalibrator | None = None
tracer = get_tracer("svc-extract")
metrics = get_metrics()
@app.on_event("startup")
async def startup_event() -> None:
"""Initialize service dependencies"""
global storage_client, document_storage, event_bus, confidence_calibrator
@@ -116,7 +94,6 @@ async def startup_event() -> None:
logger.info("Extraction service started successfully")
@app.on_event("shutdown")
async def shutdown_event() -> None:
"""Cleanup service dependencies"""
global event_bus
@@ -129,6 +106,29 @@ async def shutdown_event() -> None:
logger.info("Extraction service shutdown complete")
# Create app and settings
app, settings = create_app(
service_name="svc-extract",
title="Tax Agent Extraction Service",
description="LLM-based field extraction service",
settings_class=ExtractionSettings,
startup_hooks=[startup_event],
shutdown_hooks=[shutdown_event],
)
# Add middleware
middleware_factory = create_trusted_proxy_middleware(settings.internal_cidrs)
app.add_middleware(middleware_factory)
# Global clients
storage_client: StorageClient | None = None
document_storage: DocumentStorage | None = None
event_bus: EventBus | None = None
confidence_calibrator: ConfidenceCalibrator | None = None
tracer = get_tracer("svc-extract")
metrics = get_metrics()
@app.post("/extract/{doc_id}", response_model=ExtractionResponse)
async def extract_fields(
doc_id: str,
@@ -334,13 +334,14 @@ async def _extract_fields_async(
)
# Update metrics
metrics.counter("extractions_completed_total").labels(
tenant_id=tenant_id, strategy=strategy
).inc()
metrics.counter(
"extract_extractions_completed_total",
labelnames=["tenant_id", "strategy"],
).labels(tenant_id=tenant_id, strategy=strategy).inc()
metrics.histogram("extraction_confidence").labels(
strategy=strategy
).observe(calibrated_confidence)
metrics.histogram(
"extract_extraction_confidence", labelnames=["strategy"]
).labels(strategy=strategy).observe(calibrated_confidence)
# Publish completion event
event_payload = EventPayload(
@@ -371,7 +372,10 @@ async def _extract_fields_async(
logger.error("Field extraction failed", doc_id=doc_id, error=str(e))
# Update error metrics
metrics.counter("extraction_errors_total").labels(
metrics.counter(
"extract_extraction_errors_total",
labelnames=["tenant_id", "strategy", "error_type"],
).labels(
tenant_id=tenant_id, strategy=strategy, error_type=type(e).__name__
).inc()