Files
ai-tax-agent/docs/encryption-strategy.md
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

14 KiB

Encryption Strategy

AI Tax Agent System

Document Version: 1.0
Date: 2024-01-31
Owner: Security Architecture Team

1. Executive Summary

This document defines the comprehensive encryption strategy for the AI Tax Agent System, covering data at rest, in transit, and in use. The strategy implements defense-in-depth with multiple encryption layers and key management best practices.

2. Encryption Requirements

2.1 Regulatory Requirements

  • GDPR Article 32: Appropriate technical measures including encryption
  • UK Data Protection Act 2018: Security of processing requirements
  • HMRC Security Standards: Government security classifications
  • ISO 27001: Information security management requirements
  • SOC 2 Type II: Security and availability controls

2.2 Business Requirements

  • Client Data Protection: Financial and personal information
  • Intellectual Property: Proprietary algorithms and models
  • Regulatory Compliance: Audit trail and evidence integrity
  • Business Continuity: Key recovery and disaster recovery

3. Encryption Architecture

3.1 Encryption Layers

graph TB
    A[Client Browser] -->|TLS 1.3| B[Traefik Gateway]
    B -->|mTLS| C[Application Services]
    C -->|Application-Level| D[Database Layer]
    D -->|Transparent Data Encryption| E[Storage Layer]
    E -->|Volume Encryption| F[Disk Storage]
    
    G[Key Management] --> H[Vault HSM]
    H --> I[Encryption Keys]
    I --> C
    I --> D
    I --> E

3.2 Encryption Domains

Domain Technology Key Size Algorithm Rotation
Transport TLS 1.3 256-bit AES-GCM, ChaCha20-Poly1305 Annual
Application AES-GCM 256-bit AES-256-GCM Quarterly
Database TDE 256-bit AES-256-CBC Quarterly
Storage LUKS/dm-crypt 256-bit AES-256-XTS Annual
Backup GPG 4096-bit RSA-4096 + AES-256 Annual

4. Data Classification and Encryption

4.1 Data Classification Matrix

Classification Examples Encryption Level Key Access
PUBLIC Marketing materials, documentation TLS only Public
INTERNAL System logs, metrics TLS + Storage Service accounts
CONFIDENTIAL Client names, addresses TLS + App + Storage Authorized users
RESTRICTED Financial data, UTR, NI numbers TLS + App + Field + Storage Need-to-know
SECRET Encryption keys, certificates HSM + Multiple layers Key custodians

4.2 Field-Level Encryption

Sensitive Fields Requiring Field-Level Encryption:

ENCRYPTED_FIELDS = {
    'taxpayer_profile': ['utr', 'ni_number', 'full_name', 'address'],
    'financial_data': ['account_number', 'sort_code', 'iban', 'amount'],
    'document_content': ['ocr_text', 'extracted_fields'],
    'authentication': ['password_hash', 'api_keys', 'tokens']
}

Implementation Example:

from cryptography.fernet import Fernet
import vault_client

class FieldEncryption:
    def __init__(self, vault_client):
        self.vault = vault_client
        
    def encrypt_field(self, field_name: str, value: str) -> str:
        """Encrypt sensitive field using Vault transit engine"""
        key_name = f"field-{field_name}"
        response = self.vault.encrypt(
            mount_point='transit',
            name=key_name,
            plaintext=base64.b64encode(value.encode()).decode()
        )
        return response['data']['ciphertext']
    
    def decrypt_field(self, field_name: str, ciphertext: str) -> str:
        """Decrypt sensitive field using Vault transit engine"""
        key_name = f"field-{field_name}"
        response = self.vault.decrypt(
            mount_point='transit',
            name=key_name,
            ciphertext=ciphertext
        )
        return base64.b64decode(response['data']['plaintext']).decode()

5. Key Management Strategy

5.1 Key Hierarchy

Root Key (HSM)
├── Master Encryption Key (MEK)
│   ├── Data Encryption Keys (DEK)
│   │   ├── Database DEK
│   │   ├── Application DEK
│   │   └── Storage DEK
│   └── Key Encryption Keys (KEK)
│       ├── Field Encryption KEK
│       ├── Backup KEK
│       └── Archive KEK
└── Signing Keys
    ├── JWT Signing Key
    ├── Document Signing Key
    └── API Signing Key

5.2 HashiCorp Vault Configuration

Vault Policies:

# Database encryption policy
path "transit/encrypt/database-*" {
  capabilities = ["create", "update"]
}

path "transit/decrypt/database-*" {
  capabilities = ["create", "update"]
}

# Application encryption policy
path "transit/encrypt/app-*" {
  capabilities = ["create", "update"]
}

path "transit/decrypt/app-*" {
  capabilities = ["create", "update"]
}

# Field encryption policy (restricted)
path "transit/encrypt/field-*" {
  capabilities = ["create", "update"]
  allowed_parameters = {
    "plaintext" = []
  }
  denied_parameters = {
    "batch_input" = []
  }
}

Key Rotation Policy:

# Automatic key rotation
path "transit/keys/database-primary" {
  min_decryption_version = 1
  min_encryption_version = 2
  deletion_allowed = false
  auto_rotate_period = "2160h"  # 90 days
}

5.3 Hardware Security Module (HSM)

HSM Configuration:

  • Type: AWS CloudHSM / Azure Dedicated HSM
  • FIPS Level: FIPS 140-2 Level 3
  • High Availability: Multi-AZ deployment
  • Backup: Encrypted key backup to secure offline storage

6. Transport Layer Security

6.1 TLS Configuration

Traefik TLS Configuration:

tls:
  options:
    default:
      minVersion: "VersionTLS13"
      maxVersion: "VersionTLS13"
      cipherSuites:
        - "TLS_AES_256_GCM_SHA384"
        - "TLS_CHACHA20_POLY1305_SHA256"
        - "TLS_AES_128_GCM_SHA256"
      curvePreferences:
        - "X25519"
        - "secp384r1"
      sniStrict: true
      
  certificates:
    - certFile: /certs/wildcard.crt
      keyFile: /certs/wildcard.key

6.2 Certificate Management

Certificate Lifecycle:

  • Issuance: Let's Encrypt with DNS challenge
  • Rotation: Automated 30-day renewal
  • Monitoring: Certificate expiry alerts
  • Backup: Encrypted certificate backup

Internal PKI:

# Vault PKI setup
vault secrets enable -path=pki-root pki
vault secrets tune -max-lease-ttl=87600h pki-root

vault write pki-root/root/generate/internal \
    common_name="AI Tax Agent Root CA" \
    ttl=87600h \
    key_bits=4096

vault secrets enable -path=pki-int pki
vault secrets tune -max-lease-ttl=43800h pki-int

vault write pki-int/intermediate/generate/internal \
    common_name="AI Tax Agent Intermediate CA" \
    ttl=43800h \
    key_bits=4096

7. Database Encryption

7.1 PostgreSQL Encryption

Transparent Data Encryption (TDE):

-- Enable pgcrypto extension
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Create encrypted table
CREATE TABLE taxpayer_profiles (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    utr_encrypted BYTEA NOT NULL,
    ni_number_encrypted BYTEA NOT NULL,
    name_encrypted BYTEA NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Encryption functions
CREATE OR REPLACE FUNCTION encrypt_pii(data TEXT, key_id TEXT)
RETURNS BYTEA AS $$
BEGIN
    -- Use Vault transit engine for encryption
    RETURN vault_encrypt(data, key_id);
END;
$$ LANGUAGE plpgsql;

Column-Level Encryption:

class EncryptedTaxpayerProfile(Base):
    __tablename__ = 'taxpayer_profiles'
    
    id = Column(UUID, primary_key=True, default=uuid.uuid4)
    utr_encrypted = Column(LargeBinary, nullable=False)
    ni_number_encrypted = Column(LargeBinary, nullable=False)
    
    @hybrid_property
    def utr(self):
        return vault_client.decrypt('field-utr', self.utr_encrypted)
    
    @utr.setter
    def utr(self, value):
        self.utr_encrypted = vault_client.encrypt('field-utr', value)

7.2 Neo4j Encryption

Enterprise Edition Features:

// Enable encryption at rest
CALL dbms.security.setConfigValue('dbms.security.encryption.enabled', 'true');

// Create encrypted property
CREATE CONSTRAINT encrypted_utr IF NOT EXISTS
FOR (tp:TaxpayerProfile)
REQUIRE tp.utr_encrypted IS NOT NULL;

// Encryption UDF
CALL apoc.custom.asFunction(
    'encrypt',
    'RETURN apoc.util.md5([text, $key])',
    'STRING',
    [['text', 'STRING'], ['key', 'STRING']]
);

8. Application-Level Encryption

8.1 Microservice Encryption

Service-to-Service Communication:

import httpx
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import rsa, padding

class SecureServiceClient:
    def __init__(self, service_url: str, private_key: rsa.RSAPrivateKey):
        self.service_url = service_url
        self.private_key = private_key
        
    async def make_request(self, endpoint: str, data: dict):
        # Encrypt request payload
        encrypted_data = self.encrypt_payload(data)
        
        # Sign request
        signature = self.sign_request(encrypted_data)
        
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.service_url}/{endpoint}",
                json={"data": encrypted_data, "signature": signature},
                headers={"Content-Type": "application/json"}
            )
            
        # Decrypt response
        return self.decrypt_response(response.json())

8.2 Document Encryption

Document Storage Encryption:

class DocumentEncryption:
    def __init__(self, vault_client):
        self.vault = vault_client
        
    def encrypt_document(self, document_content: bytes, doc_id: str) -> dict:
        """Encrypt document with unique DEK"""
        # Generate document-specific DEK
        dek = self.vault.generate_data_key('document-master-key')
        
        # Encrypt document with DEK
        cipher = Fernet(dek['plaintext_key'])
        encrypted_content = cipher.encrypt(document_content)
        
        # Store encrypted DEK
        encrypted_dek = dek['ciphertext_key']
        
        return {
            'encrypted_content': encrypted_content,
            'encrypted_dek': encrypted_dek,
            'key_version': dek['key_version']
        }

9. Backup and Archive Encryption

9.1 Backup Encryption Strategy

Multi-Layer Backup Encryption:

#!/bin/bash
# Backup encryption script

# 1. Database dump with encryption
pg_dump tax_system | gpg --cipher-algo AES256 --compress-algo 2 \
    --symmetric --output backup_$(date +%Y%m%d).sql.gpg

# 2. Neo4j backup with encryption
neo4j-admin backup --backup-dir=/backups/neo4j \
    --name=graph_$(date +%Y%m%d) --encrypt

# 3. Document backup with encryption
tar -czf - /data/documents | gpg --cipher-algo AES256 \
    --symmetric --output documents_$(date +%Y%m%d).tar.gz.gpg

# 4. Upload to encrypted cloud storage
aws s3 cp backup_$(date +%Y%m%d).sql.gpg \
    s3://tax-agent-backups/ --sse aws:kms --sse-kms-key-id alias/backup-key

9.2 Archive Encryption

Long-Term Archive Strategy:

  • Encryption: AES-256 with 10-year key retention
  • Integrity: SHA-256 checksums with digital signatures
  • Storage: Geographically distributed encrypted storage
  • Access: Multi-person authorization for archive access

10. Key Rotation and Recovery

10.1 Automated Key Rotation

Rotation Schedule:

ROTATION_SCHEDULE = {
    'transport_keys': timedelta(days=365),      # Annual
    'application_keys': timedelta(days=90),     # Quarterly  
    'database_keys': timedelta(days=90),        # Quarterly
    'field_encryption_keys': timedelta(days=30), # Monthly
    'signing_keys': timedelta(days=180),        # Bi-annual
}

class KeyRotationManager:
    def __init__(self, vault_client):
        self.vault = vault_client
        
    async def rotate_keys(self):
        """Automated key rotation process"""
        for key_type, rotation_period in ROTATION_SCHEDULE.items():
            keys = await self.get_keys_due_for_rotation(key_type, rotation_period)
            
            for key in keys:
                await self.rotate_key(key)
                await self.update_applications(key)
                await self.verify_rotation(key)

10.2 Key Recovery Procedures

Emergency Key Recovery:

  1. Multi-Person Authorization: Require 3 of 5 key custodians
  2. Secure Communication: Use encrypted channels for coordination
  3. Audit Trail: Log all recovery activities
  4. Verification: Verify key integrity before use
  5. Re-encryption: Re-encrypt data with new keys if compromise suspected

11. Monitoring and Compliance

11.1 Encryption Monitoring

Key Metrics:

  • Key rotation compliance rate
  • Encryption coverage percentage
  • Failed encryption/decryption attempts
  • Key access patterns and anomalies
  • Certificate expiry warnings

Alerting Rules:

groups:
  - name: encryption_alerts
    rules:
      - alert: KeyRotationOverdue
        expr: vault_key_age_days > 90
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Encryption key rotation overdue"
          
      - alert: EncryptionFailure
        expr: rate(encryption_errors_total[5m]) > 0.1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High encryption failure rate detected"

11.2 Compliance Reporting

Quarterly Encryption Report:

  • Encryption coverage by data classification
  • Key rotation compliance status
  • Security incidents related to encryption
  • Vulnerability assessment results
  • Compliance gap analysis

12. Incident Response

12.1 Key Compromise Response

Response Procedures:

  1. Immediate: Revoke compromised keys
  2. Assessment: Determine scope of compromise
  3. Containment: Isolate affected systems
  4. Recovery: Generate new keys and re-encrypt data
  5. Lessons Learned: Update procedures and controls

12.2 Encryption Failure Response

Failure Scenarios:

  • HSM hardware failure
  • Key corruption or loss
  • Encryption service outage
  • Certificate expiry

Recovery Procedures:

  • Activate backup HSM
  • Restore keys from secure backup
  • Implement manual encryption processes
  • Emergency certificate issuance

Document Classification: CONFIDENTIAL
Next Review Date: 2024-07-31
Approval: Security Architecture Team