Sovereign AI + RAG
Private AI on your own infrastructure — financial records, client data, legal documents, government materials. Zero third-party data exposure. CBUAE Sovereign Cloud aligned. UAE PDPL native.
The Problem
Cloud-only AI is a non-starter for regulated industries — banks bound by CBUAE rules, healthcare bound by HIPAA, government bound by data residency mandates, family offices bound by privacy. Generic LLM API calls leak your data, your users, and your queries to a third party in a foreign jurisdiction. The compliance team blocks the deployment, and the AI initiative dies in a meeting room.
The Outcome
An LLM and RAG system you fully own. Open-weight models (Llama, Qwen, Mistral, Falcon-H1 Arabic, DeepSeek) running on your infrastructure or in a sovereign-cloud region. Audit trails. Air-gapped option. Compliance documentation that maps to ISO 42001, PDPL, GDPR, HIPAA, and the EU AI Act.
Packages
Foundations
From AED 150,000
Single-corpus deployment, on-premise inference on Mac Studio M2/M4 or A100. PDPL native, basic eval coverage.
- 1 knowledge corpus (≤500K documents)
- Open-weight model: Llama 3.3 70B / Qwen 2.5 / Falcon-H1 Arabic
- Hardware procurement and setup
- Basic eval harness, weekly QA
- 8 weeks to production
Sovereign
From AED 280,000
Multi-corpus deployment, CBUAE Sovereign Cloud aligned. Governance documentation structured around ISO 42001 expectations. Arabic + English.
- Up to 5 knowledge corpora
- Multi-language: Arabic (Falcon-H1 / Jais 2) + English
- CBUAE Sovereign Financial Cloud architecture pattern
- Governance documentation pack (ISO 42001-shaped)
- Audit logging, RBAC, SSO integration
- 11 weeks to production
Government
From AED 480,000
Air-gapped, zero external network, NVIDIA H100 hardware procurement, full audit trail, 12-month SLA.
- Air-gapped, zero internet egress
- NVIDIA H100 / H200 hardware procurement and rack install
- Multi-tenant or fully-isolated deployment
- Full audit trail, exportable to SIEM
- 12-month SLA with on-site support
- 14 weeks to production
In Scope
Architecture
Weeks 1–3- Hardware sizing based on user count, corpus size, latency target
- Model selection — Llama 3.3, Qwen 2.5, Falcon-H1 Arabic, DeepSeek
- Network and access architecture (on-prem, VPC, air-gapped)
- Compliance mapping against the frameworks you operate under (PDPL, ISO 42001, HIPAA, GDPR, EU AI Act as applicable)
Deployment
Weeks 4–9- Hardware procurement and rack-and-stack (Sovereign / Government tiers)
- Model deployment — vLLM, Ollama, or custom inference server
- RAG ingestion pipeline with chunking, embedding, hybrid retrieval
- Authentication, RBAC, and audit logging integration
- API surface for your applications
Production
Weeks 10–14- Acceptance testing against golden set and adversarial set
- Compliance documentation handover (AI inventory, risk register, monitoring plan — ISO 42001-shaped)
- Operations runbook and incident response
- Knowledge transfer to your infrastructure team
How We Engage
01
Discovery call — we map data classification, latency requirements, regulatory constraints, and existing infrastructure. 60 minutes.
02
Architecture proposal — fixed-price SoW with named hardware, model choice, deployment topology, and compliance package. Delivered within 7 business days.
03
Hardware-first kickoff — for Sovereign and Government tiers we order hardware on contract signing to compress the timeline. Software work begins in parallel.
Why Codenovai
We're an operator-first agency. The hardware sizing, inference stack, and deployment patterns we recommend are ones we'd deploy on our own infrastructure — not theoretical reference architectures from a slide deck.
FAQ
- Why Falcon-H1 Arabic and Jais 2 instead of GPT-4 or Claude?
- For Arabic-first workloads, Falcon-H1 Arabic and Jais 2 outperform Western frontier models on MSA and major Gulf dialects, and they're available as open weights. They also stay on your infrastructure. For English-only workloads we recommend Llama 3.3 70B or Qwen 2.5 — both are strong, both are open weights, both run on the hardware tiers above. We benchmark for your specific corpus before selecting.
- What ongoing costs should I expect after the build?
- Inference electricity and cooling for the Foundations tier is typically AED 4,000–8,000 per month. Sovereign tier with H100 GPUs runs AED 12,000–25,000 per month depending on utilisation. We offer an optional managed-operations retainer (separate SoW) starting at AED 18,000/month for monitoring, eval drift detection, and quarterly model upgrades.
- Can we swap models later — say, when Llama 4 ships?
- Yes. The deployment architecture is model-agnostic — your application code calls a stable internal API, and the inference server underneath is swappable. We handle major model upgrades on the managed-operations retainer; if you self-manage, the swap is typically a 2–3 day exercise.
- How does this compare to Azure OpenAI or AWS Bedrock?
- Bedrock and Azure OpenAI are managed cloud services — your data still leaves your environment to call the model, even if the cloud provider promises not to retain it. The Sovereign tier here keeps everything inside your infrastructure. For workloads where contractual or regulatory language requires zero third-party data exposure, Bedrock and Azure OpenAI fail the test. Where they pass, our Agentic Pilot offer is usually a better fit.
- Do you support hybrid deployments — sovereign for regulated workloads, cloud for general?
- Yes. A common pattern is Foundations or Sovereign tier for regulated data (client records, financial details), with cloud-routed Claude/GPT for general productivity tasks (drafting, summarisation of public material). We design the routing layer so the same end-user interface uses the appropriate backend transparently.