Architecture - Delta Forge

Three Components. Complete Platform.

Delta Forge keeps it simple — a control plane for governance, compute workers for queries, and your own storage for data.

Control Plane

The brain of the platform. Manages metadata, security and credentials so your data stays governed.

Data catalog (schemas, tables, views)
Role-based access & row-level security
Credential vault for storage accounts
Git source control for pipelines
User management & JWT authentication

Compute Workers

Stateless query engines that scale horizontally. Add more workers for more concurrency.

Vectorised SQL execution engine
Desktop GUI, VS Code extension, and CLI
Arrow Flight SQL for high-speed transfers
Auto-registers with the control plane
Scales from 1 to hundreds of nodes

Your Storage

Data stays in Delta Lake format on storage you already own and control. No lock-in.

Azure Blob Storage (ADLS Gen2)
AWS S3 & S3-compatible (MinIO)
Google Cloud Storage
Local filesystem & NAS
Open Delta Lake format (Parquet files)

Deployment Scenarios

See how Delta Forge fits into real infrastructure

Private Data Centre

Run Delta Forge entirely within your network perimeter. Data never leaves your infrastructure. Deploy on bare-metal servers, VMware, or an on-premise Kubernetes cluster.

Runs on Kubernetes or Docker Compose

Two container images: control plane and worker. Orchestrate with Helm charts or a simple compose file.

Scale-out with auto-scaling

Kubernetes HPA scales workers based on CPU and memory. KEDA support for scale-to-zero when idle.

MinIO for S3-compatible storage

Use MinIO, Ceph, or any NAS mount. Delta Lake tables are standard Parquet — no proprietary formats.

Automated setup

Bootstraps its own PostgreSQL catalog on first run. No manual database provisioning or configuration required.

Azure Cloud

Deploy on Azure Kubernetes Service with native integration into ADLS Gen2, Azure Key Vault, and Entra ID. Scale compute independently from storage.

Azure Data Lake Storage Gen2

Store Delta Lake tables on ADLS Gen2 with hierarchical namespace. Native abfss:// protocol support.

AKS with auto-scaling

Workers auto-scale based on CPU/memory demand. KEDA integration enables scale-to-zero for cost savings.

Azure Key Vault integration

Storage credentials and secrets managed via Key Vault. Rotate keys without redeploying workers.

Entra ID single sign-on

Authenticate users via Managed Identity or service principals. Map Azure AD groups to Delta Forge roles.

Connects to Everything You Already Use

Delta Forge extends PostgreSQL with powerful new commands, and provides purpose-built interfaces to use them

Delta Forge Interfaces

The Delta Forge GUI, VS Code extension, and CLI — built for extended SQL including PIPELINE, VACUUM, OPTIMIZE, and time travel.

BI & Analytics

Power BI, Tableau, Looker, Metabase connect via PostgreSQL or Arrow Flight SQL.

Data Sources

Federate queries to PostgreSQL, MySQL, SQL Server, plus CSV, JSON, Parquet, Excel.

Industry Formats

Native parsing of HL7 (healthcare), FHIR, and EDI (X12, EDIFACT, TRADACOMS).

Enterprise Security Built In

Governance and compliance are not add-ons — they are part of every query

Role-Based Access Control

Granular RBAC with role inheritance, row-level security filters, and column-level masking. Enforce who sees what at the query engine level.

GDPR Pseudonymisation

Built-in pseudonymisation engine with keyed hashing, AES encryption, and redaction transforms. Deterministic output supports joins across datasets.

Audit & Compliance

Every query, credential access, and permission change is logged. Full audit trail for SOC 2, HIPAA, and GDPR compliance requirements.

Why This Architecture Matters

What sets Delta Forge apart from legacy data platforms

No Spark. No Hadoop.

Purpose-built on Apache Arrow. A single worker binary replaces an entire Spark cluster — dramatically reducing infrastructure cost and operational complexity.

Open Format, Zero Lock-in

All data is stored as standard Delta Lake tables (Parquet files + JSON transaction log). Compatible with Databricks, Apache Spark, Trino, and any tool that reads Delta Lake.

PostgreSQL Compatible

Workers are accessed through the Delta Forge Desktop GUI, VS Code extension, or CLI. Purpose-built tools designed for lakehouse workflows.

Desktop to Data Centre

The same engine runs as a desktop application for development and as a distributed cluster for production. One platform from prototyping to scale.

Lightweight Footprint

Control plane needs just 512 MB RAM with an embedded SQLite catalog. No external PostgreSQL, Redis, or ZooKeeper. Workers start in seconds and scale to zero when idle.

Separation of Compute & Storage

Compute workers are stateless — scale them independently from storage. Pay for compute only when queries are running. Storage costs stay predictable.

Deploy Delta Forge Your Way