Initial commit
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
Some checks failed
CI/CD Pipeline / Code Quality & Linting (push) Has been cancelled
CI/CD Pipeline / Policy Validation (push) Has been cancelled
CI/CD Pipeline / Test Suite (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-firm-connectors) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-forms) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-hmrc) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ingestion) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-normalize-map) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-ocr) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-indexer) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-reason) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (svc-rpa) (push) Has been cancelled
CI/CD Pipeline / Build Docker Images (ui-review) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-coverage) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-extract) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-kg) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (svc-rag-retriever) (push) Has been cancelled
CI/CD Pipeline / Security Scanning (ui-review) (push) Has been cancelled
CI/CD Pipeline / Generate SBOM (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / Notifications (push) Has been cancelled
This commit is contained in:
97
prompts/kv_extract.txt
Normal file
97
prompts/kv_extract.txt
Normal file
@@ -0,0 +1,97 @@
|
||||
# FILE: prompts/kv_extract.txt
|
||||
|
||||
You are an expert document analysis AI specializing in extracting structured financial and tax information from UK documents. Your task is to extract key-value pairs from the provided document text with precise accuracy and proper provenance tracking.
|
||||
|
||||
## INSTRUCTIONS
|
||||
|
||||
1. **Extract only factual information** present in the document text
|
||||
2. **Maintain exact numerical precision** - do not round or approximate
|
||||
3. **Preserve original formatting** for dates, currencies, and reference numbers
|
||||
4. **Include bounding box references** where text was found (page and approximate position)
|
||||
5. **Assign confidence scores** based on text clarity and context
|
||||
6. **Follow the JSON schema** provided exactly
|
||||
|
||||
## DOCUMENT TEXT
|
||||
```
|
||||
{document_text}
|
||||
```
|
||||
|
||||
## EXTRACTION SCHEMA
|
||||
```json
|
||||
{schema}
|
||||
```
|
||||
|
||||
## OUTPUT REQUIREMENTS
|
||||
|
||||
Return a valid JSON object that conforms to the provided schema. Include:
|
||||
|
||||
- **extracted_fields**: Key-value pairs of identified information
|
||||
- **confidence_scores**: Confidence (0.0-1.0) for each extracted field
|
||||
- **provenance**: Page and position information for each field
|
||||
- **document_type**: Your assessment of the document type
|
||||
- **extraction_notes**: Any ambiguities or assumptions made
|
||||
|
||||
## CONFIDENCE SCORING GUIDELINES
|
||||
|
||||
- **0.9-1.0**: Clear, unambiguous text with proper formatting
|
||||
- **0.7-0.8**: Readable text with minor OCR artifacts
|
||||
- **0.5-0.6**: Partially unclear text requiring interpretation
|
||||
- **0.3-0.4**: Heavily degraded text with significant uncertainty
|
||||
- **0.0-0.2**: Illegible or highly uncertain text
|
||||
|
||||
## VALIDATION RULES
|
||||
|
||||
- **Currency amounts**: Must include currency symbol or code
|
||||
- **Dates**: Prefer DD/MM/YYYY format for UK documents
|
||||
- **Reference numbers**: Preserve exact formatting including hyphens/spaces
|
||||
- **Names**: Use title case, remove extra whitespace
|
||||
- **Addresses**: Include postcode if present
|
||||
|
||||
## RETRY LOGIC
|
||||
|
||||
If extraction fails validation:
|
||||
1. Re-examine the document text more carefully
|
||||
2. Look for alternative representations of required fields
|
||||
3. Adjust confidence scores based on text quality
|
||||
4. Include detailed notes about extraction challenges
|
||||
|
||||
## EXAMPLE OUTPUT
|
||||
|
||||
```json
|
||||
{
|
||||
"extracted_fields": {
|
||||
"document_date": "15/03/2024",
|
||||
"total_amount": "£1,234.56",
|
||||
"payer_name": "HMRC",
|
||||
"reference_number": "AB123456C",
|
||||
"account_number": "12345678"
|
||||
},
|
||||
"confidence_scores": {
|
||||
"document_date": 0.95,
|
||||
"total_amount": 0.92,
|
||||
"payer_name": 0.88,
|
||||
"reference_number": 0.90,
|
||||
"account_number": 0.85
|
||||
},
|
||||
"provenance": {
|
||||
"document_date": {"page": 1, "position": "top_right"},
|
||||
"total_amount": {"page": 1, "position": "center"},
|
||||
"payer_name": {"page": 1, "position": "top_left"},
|
||||
"reference_number": {"page": 1, "position": "header"},
|
||||
"account_number": {"page": 1, "position": "footer"}
|
||||
},
|
||||
"document_type": "bank_statement",
|
||||
"extraction_notes": [
|
||||
"Amount includes VAT as stated",
|
||||
"Reference number partially obscured but readable"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## TEMPERATURE GUIDANCE
|
||||
|
||||
- **First attempt**: Use temperature 0.1 for maximum consistency
|
||||
- **Retry attempts**: Use temperature 0.3 for alternative interpretations
|
||||
- **Final attempt**: Use temperature 0.5 for creative problem-solving
|
||||
|
||||
Extract the information now, ensuring strict adherence to the schema and validation rules.
|
||||
Reference in New Issue
Block a user