Files
ai-tax-agent/docs/encryption-strategy.md
harkon b324ff09ef
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Initial commit
2025-10-11 08:41:36 +01:00

508 lines
14 KiB
Markdown

# Encryption Strategy
## AI Tax Agent System
**Document Version:** 1.0
**Date:** 2024-01-31
**Owner:** Security Architecture Team
## 1. Executive Summary
This document defines the comprehensive encryption strategy for the AI Tax Agent System, covering data at rest, in transit, and in use. The strategy implements defense-in-depth with multiple encryption layers and key management best practices.
## 2. Encryption Requirements
### 2.1 Regulatory Requirements
- **GDPR Article 32**: Appropriate technical measures including encryption
- **UK Data Protection Act 2018**: Security of processing requirements
- **HMRC Security Standards**: Government security classifications
- **ISO 27001**: Information security management requirements
- **SOC 2 Type II**: Security and availability controls
### 2.2 Business Requirements
- **Client Data Protection**: Financial and personal information
- **Intellectual Property**: Proprietary algorithms and models
- **Regulatory Compliance**: Audit trail and evidence integrity
- **Business Continuity**: Key recovery and disaster recovery
## 3. Encryption Architecture
### 3.1 Encryption Layers
```mermaid
graph TB
A[Client Browser] -->|TLS 1.3| B[Traefik Gateway]
B -->|mTLS| C[Application Services]
C -->|Application-Level| D[Database Layer]
D -->|Transparent Data Encryption| E[Storage Layer]
E -->|Volume Encryption| F[Disk Storage]
G[Key Management] --> H[Vault HSM]
H --> I[Encryption Keys]
I --> C
I --> D
I --> E
```
### 3.2 Encryption Domains
| Domain | Technology | Key Size | Algorithm | Rotation |
|--------|------------|----------|-----------|----------|
| **Transport** | TLS 1.3 | 256-bit | AES-GCM, ChaCha20-Poly1305 | Annual |
| **Application** | AES-GCM | 256-bit | AES-256-GCM | Quarterly |
| **Database** | TDE | 256-bit | AES-256-CBC | Quarterly |
| **Storage** | LUKS/dm-crypt | 256-bit | AES-256-XTS | Annual |
| **Backup** | GPG | 4096-bit | RSA-4096 + AES-256 | Annual |
## 4. Data Classification and Encryption
### 4.1 Data Classification Matrix
| Classification | Examples | Encryption Level | Key Access |
|----------------|----------|------------------|------------|
| **PUBLIC** | Marketing materials, documentation | TLS only | Public |
| **INTERNAL** | System logs, metrics | TLS + Storage | Service accounts |
| **CONFIDENTIAL** | Client names, addresses | TLS + App + Storage | Authorized users |
| **RESTRICTED** | Financial data, UTR, NI numbers | TLS + App + Field + Storage | Need-to-know |
| **SECRET** | Encryption keys, certificates | HSM + Multiple layers | Key custodians |
### 4.2 Field-Level Encryption
**Sensitive Fields Requiring Field-Level Encryption:**
```python
ENCRYPTED_FIELDS = {
'taxpayer_profile': ['utr', 'ni_number', 'full_name', 'address'],
'financial_data': ['account_number', 'sort_code', 'iban', 'amount'],
'document_content': ['ocr_text', 'extracted_fields'],
'authentication': ['password_hash', 'api_keys', 'tokens']
}
```
**Implementation Example:**
```python
from cryptography.fernet import Fernet
import vault_client
class FieldEncryption:
def __init__(self, vault_client):
self.vault = vault_client
def encrypt_field(self, field_name: str, value: str) -> str:
"""Encrypt sensitive field using Vault transit engine"""
key_name = f"field-{field_name}"
response = self.vault.encrypt(
mount_point='transit',
name=key_name,
plaintext=base64.b64encode(value.encode()).decode()
)
return response['data']['ciphertext']
def decrypt_field(self, field_name: str, ciphertext: str) -> str:
"""Decrypt sensitive field using Vault transit engine"""
key_name = f"field-{field_name}"
response = self.vault.decrypt(
mount_point='transit',
name=key_name,
ciphertext=ciphertext
)
return base64.b64decode(response['data']['plaintext']).decode()
```
## 5. Key Management Strategy
### 5.1 Key Hierarchy
```
Root Key (HSM)
├── Master Encryption Key (MEK)
│ ├── Data Encryption Keys (DEK)
│ │ ├── Database DEK
│ │ ├── Application DEK
│ │ └── Storage DEK
│ └── Key Encryption Keys (KEK)
│ ├── Field Encryption KEK
│ ├── Backup KEK
│ └── Archive KEK
└── Signing Keys
├── JWT Signing Key
├── Document Signing Key
└── API Signing Key
```
### 5.2 HashiCorp Vault Configuration
**Vault Policies:**
```hcl
# Database encryption policy
path "transit/encrypt/database-*" {
capabilities = ["create", "update"]
}
path "transit/decrypt/database-*" {
capabilities = ["create", "update"]
}
# Application encryption policy
path "transit/encrypt/app-*" {
capabilities = ["create", "update"]
}
path "transit/decrypt/app-*" {
capabilities = ["create", "update"]
}
# Field encryption policy (restricted)
path "transit/encrypt/field-*" {
capabilities = ["create", "update"]
allowed_parameters = {
"plaintext" = []
}
denied_parameters = {
"batch_input" = []
}
}
```
**Key Rotation Policy:**
```hcl
# Automatic key rotation
path "transit/keys/database-primary" {
min_decryption_version = 1
min_encryption_version = 2
deletion_allowed = false
auto_rotate_period = "2160h" # 90 days
}
```
### 5.3 Hardware Security Module (HSM)
**HSM Configuration:**
- **Type**: AWS CloudHSM / Azure Dedicated HSM
- **FIPS Level**: FIPS 140-2 Level 3
- **High Availability**: Multi-AZ deployment
- **Backup**: Encrypted key backup to secure offline storage
## 6. Transport Layer Security
### 6.1 TLS Configuration
**Traefik TLS Configuration:**
```yaml
tls:
options:
default:
minVersion: "VersionTLS13"
maxVersion: "VersionTLS13"
cipherSuites:
- "TLS_AES_256_GCM_SHA384"
- "TLS_CHACHA20_POLY1305_SHA256"
- "TLS_AES_128_GCM_SHA256"
curvePreferences:
- "X25519"
- "secp384r1"
sniStrict: true
certificates:
- certFile: /certs/wildcard.crt
keyFile: /certs/wildcard.key
```
### 6.2 Certificate Management
**Certificate Lifecycle:**
- **Issuance**: Let's Encrypt with DNS challenge
- **Rotation**: Automated 30-day renewal
- **Monitoring**: Certificate expiry alerts
- **Backup**: Encrypted certificate backup
**Internal PKI:**
```bash
# Vault PKI setup
vault secrets enable -path=pki-root pki
vault secrets tune -max-lease-ttl=87600h pki-root
vault write pki-root/root/generate/internal \
common_name="AI Tax Agent Root CA" \
ttl=87600h \
key_bits=4096
vault secrets enable -path=pki-int pki
vault secrets tune -max-lease-ttl=43800h pki-int
vault write pki-int/intermediate/generate/internal \
common_name="AI Tax Agent Intermediate CA" \
ttl=43800h \
key_bits=4096
```
## 7. Database Encryption
### 7.1 PostgreSQL Encryption
**Transparent Data Encryption (TDE):**
```sql
-- Enable pgcrypto extension
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Create encrypted table
CREATE TABLE taxpayer_profiles (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
utr_encrypted BYTEA NOT NULL,
ni_number_encrypted BYTEA NOT NULL,
name_encrypted BYTEA NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Encryption functions
CREATE OR REPLACE FUNCTION encrypt_pii(data TEXT, key_id TEXT)
RETURNS BYTEA AS $$
BEGIN
-- Use Vault transit engine for encryption
RETURN vault_encrypt(data, key_id);
END;
$$ LANGUAGE plpgsql;
```
**Column-Level Encryption:**
```python
class EncryptedTaxpayerProfile(Base):
__tablename__ = 'taxpayer_profiles'
id = Column(UUID, primary_key=True, default=uuid.uuid4)
utr_encrypted = Column(LargeBinary, nullable=False)
ni_number_encrypted = Column(LargeBinary, nullable=False)
@hybrid_property
def utr(self):
return vault_client.decrypt('field-utr', self.utr_encrypted)
@utr.setter
def utr(self, value):
self.utr_encrypted = vault_client.encrypt('field-utr', value)
```
### 7.2 Neo4j Encryption
**Enterprise Edition Features:**
```cypher
// Enable encryption at rest
CALL dbms.security.setConfigValue('dbms.security.encryption.enabled', 'true');
// Create encrypted property
CREATE CONSTRAINT encrypted_utr IF NOT EXISTS
FOR (tp:TaxpayerProfile)
REQUIRE tp.utr_encrypted IS NOT NULL;
// Encryption UDF
CALL apoc.custom.asFunction(
'encrypt',
'RETURN apoc.util.md5([text, $key])',
'STRING',
[['text', 'STRING'], ['key', 'STRING']]
);
```
## 8. Application-Level Encryption
### 8.1 Microservice Encryption
**Service-to-Service Communication:**
```python
import httpx
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import rsa, padding
class SecureServiceClient:
def __init__(self, service_url: str, private_key: rsa.RSAPrivateKey):
self.service_url = service_url
self.private_key = private_key
async def make_request(self, endpoint: str, data: dict):
# Encrypt request payload
encrypted_data = self.encrypt_payload(data)
# Sign request
signature = self.sign_request(encrypted_data)
async with httpx.AsyncClient() as client:
response = await client.post(
f"{self.service_url}/{endpoint}",
json={"data": encrypted_data, "signature": signature},
headers={"Content-Type": "application/json"}
)
# Decrypt response
return self.decrypt_response(response.json())
```
### 8.2 Document Encryption
**Document Storage Encryption:**
```python
class DocumentEncryption:
def __init__(self, vault_client):
self.vault = vault_client
def encrypt_document(self, document_content: bytes, doc_id: str) -> dict:
"""Encrypt document with unique DEK"""
# Generate document-specific DEK
dek = self.vault.generate_data_key('document-master-key')
# Encrypt document with DEK
cipher = Fernet(dek['plaintext_key'])
encrypted_content = cipher.encrypt(document_content)
# Store encrypted DEK
encrypted_dek = dek['ciphertext_key']
return {
'encrypted_content': encrypted_content,
'encrypted_dek': encrypted_dek,
'key_version': dek['key_version']
}
```
## 9. Backup and Archive Encryption
### 9.1 Backup Encryption Strategy
**Multi-Layer Backup Encryption:**
```bash
#!/bin/bash
# Backup encryption script
# 1. Database dump with encryption
pg_dump tax_system | gpg --cipher-algo AES256 --compress-algo 2 \
--symmetric --output backup_$(date +%Y%m%d).sql.gpg
# 2. Neo4j backup with encryption
neo4j-admin backup --backup-dir=/backups/neo4j \
--name=graph_$(date +%Y%m%d) --encrypt
# 3. Document backup with encryption
tar -czf - /data/documents | gpg --cipher-algo AES256 \
--symmetric --output documents_$(date +%Y%m%d).tar.gz.gpg
# 4. Upload to encrypted cloud storage
aws s3 cp backup_$(date +%Y%m%d).sql.gpg \
s3://tax-agent-backups/ --sse aws:kms --sse-kms-key-id alias/backup-key
```
### 9.2 Archive Encryption
**Long-Term Archive Strategy:**
- **Encryption**: AES-256 with 10-year key retention
- **Integrity**: SHA-256 checksums with digital signatures
- **Storage**: Geographically distributed encrypted storage
- **Access**: Multi-person authorization for archive access
## 10. Key Rotation and Recovery
### 10.1 Automated Key Rotation
**Rotation Schedule:**
```python
ROTATION_SCHEDULE = {
'transport_keys': timedelta(days=365), # Annual
'application_keys': timedelta(days=90), # Quarterly
'database_keys': timedelta(days=90), # Quarterly
'field_encryption_keys': timedelta(days=30), # Monthly
'signing_keys': timedelta(days=180), # Bi-annual
}
class KeyRotationManager:
def __init__(self, vault_client):
self.vault = vault_client
async def rotate_keys(self):
"""Automated key rotation process"""
for key_type, rotation_period in ROTATION_SCHEDULE.items():
keys = await self.get_keys_due_for_rotation(key_type, rotation_period)
for key in keys:
await self.rotate_key(key)
await self.update_applications(key)
await self.verify_rotation(key)
```
### 10.2 Key Recovery Procedures
**Emergency Key Recovery:**
1. **Multi-Person Authorization**: Require 3 of 5 key custodians
2. **Secure Communication**: Use encrypted channels for coordination
3. **Audit Trail**: Log all recovery activities
4. **Verification**: Verify key integrity before use
5. **Re-encryption**: Re-encrypt data with new keys if compromise suspected
## 11. Monitoring and Compliance
### 11.1 Encryption Monitoring
**Key Metrics:**
- Key rotation compliance rate
- Encryption coverage percentage
- Failed encryption/decryption attempts
- Key access patterns and anomalies
- Certificate expiry warnings
**Alerting Rules:**
```yaml
groups:
- name: encryption_alerts
rules:
- alert: KeyRotationOverdue
expr: vault_key_age_days > 90
for: 1h
labels:
severity: warning
annotations:
summary: "Encryption key rotation overdue"
- alert: EncryptionFailure
expr: rate(encryption_errors_total[5m]) > 0.1
for: 2m
labels:
severity: critical
annotations:
summary: "High encryption failure rate detected"
```
### 11.2 Compliance Reporting
**Quarterly Encryption Report:**
- Encryption coverage by data classification
- Key rotation compliance status
- Security incidents related to encryption
- Vulnerability assessment results
- Compliance gap analysis
## 12. Incident Response
### 12.1 Key Compromise Response
**Response Procedures:**
1. **Immediate**: Revoke compromised keys
2. **Assessment**: Determine scope of compromise
3. **Containment**: Isolate affected systems
4. **Recovery**: Generate new keys and re-encrypt data
5. **Lessons Learned**: Update procedures and controls
### 12.2 Encryption Failure Response
**Failure Scenarios:**
- HSM hardware failure
- Key corruption or loss
- Encryption service outage
- Certificate expiry
**Recovery Procedures:**
- Activate backup HSM
- Restore keys from secure backup
- Implement manual encryption processes
- Emergency certificate issuance
---
**Document Classification**: CONFIDENTIAL
**Next Review Date**: 2024-07-31
**Approval**: Security Architecture Team