Sluble — AI-Powered Workforce Management Platform

When you're onboarding 50 new employees a week, manual document verification becomes a serious bottleneck. HR teams spend hours checking driver's licenses, work permits, SIN cards, and compliance documents — work that's repetitive, error-prone, and doesn't require human judgment for the vast majority of cases.

We automated it with AWS Textract and a custom LLM verification layer. Here's exactly how it works.

The Architecture

The verification pipeline has five stages:

Step 1 — Document ingestion: Documents arrive as PDFs or images. PDFs are converted to PNG using PyMuPDF — one page at a time, up to 2 pages to control Textract API costs. Each page is processed independently.
Step 2 — Textract analysis: Each page is sent to AWS Textract's AnalyzeDocument API with the IDENTITY_DOCUMENT feature type. Textract extracts structured fields: name, date of birth, document number, expiry date, and more.
Step 3 — LLM screening: The extracted data is passed to AWS Bedrock with a custom screening prompt. The LLM validates the content against the expected document type and flags anomalies. Each field type can have its own screening prompt.
Step 4 — SIN validation: For Canadian SIN cards, we run a Luhn algorithm check on the extracted number. This catches OCR errors and fraudulent documents that pass visual inspection.
Step 5 — Result merging: For multi-page documents, we merge results across pages — the page with the highest confidence score wins, but extracted data is unioned across all pages so nothing is lost.

The SIN Luhn Check

The Luhn algorithm is a simple checksum formula used to validate Canadian Social Insurance Numbers. Digits at even positions (1-indexed) are doubled; if the result exceeds 9, subtract 9. The SIN is valid if the total of all 9 digits is divisible by 10.

This catches two important failure modes: OCR errors where a digit is misread, and fraudulent documents where the SIN number doesn't follow the valid format. We run this check after Textract extraction, before the document is marked as verified.

Handling Edge Cases

Low confidence (<70%): Document goes to manual review queue with the specific failure reason
Failed verification: Employee is notified and asked to resubmit with a clearer photo
Escalation: HR is notified with the specific failure reason and the extracted data for review
Audit trail: Every verification decision is logged with timestamp, confidence score, and extracted data

Results

97%+ accuracy on standard identity documents
Average verification time: 3.2 seconds per document
Manual review rate: less than 8% of documents
Fraud detection rate significantly higher than manual review

Document Verification at Scale: How We Use AWS Textract + LLMs

The Architecture

The SIN Luhn Check

Handling Edge Cases

Results

Want to see this in action?