Skip to content
Architecture

Deploy Delta Forge Your Way

A lightweight, self-contained data platform that runs wherever your data lives. No Spark clusters. No Hadoop dependencies. Just three components that fit into any infrastructure.

Three Components. Complete Platform.

Delta Forge keeps it simple — a control plane for governance, compute workers for queries, and your own storage for data.

YOUR USERS & TOOLS Desktop GUI VS Code GUI / VS Code / CLI / Arrow Flight SQL COMPUTE WORKERS Stateless — scale horizontally, scale to zero Worker 1 SQL Engine Query Execution Graph & UDF Runtime Worker 2 SQL Engine Query Execution Graph & UDF Runtime Worker N SQL Engine Query Execution Graph & UDF Runtime CONTROL PLANE Data Catalog Security & RBAC Credential Vault User Management Git Source Control SSE Auth & Git DATA SOURCES PostgreSQL / MySQL SQL Server CSV / JSON / Excel HL7 / EDI / FHIR Federate Read & Write Delta Tables YOUR STORAGE ADLS Gen2 abfss:// AWS S3 s3:// GCS gs:// Local / NAS file:// & MinIO Open Delta Lake format — Parquet files + JSON transaction log — no vendor lock-in

Control Plane

The brain of the platform. Manages metadata, security and credentials so your data stays governed.

  • Data catalog (schemas, tables, views)
  • Role-based access & row-level security
  • Credential vault for storage accounts
  • Git source control for pipelines
  • User management & JWT authentication

Compute Workers

Stateless query engines that scale horizontally. Add more workers for more concurrency.

  • Vectorised SQL execution engine
  • Desktop GUI, VS Code extension, and CLI
  • Arrow Flight SQL for high-speed transfers
  • Auto-registers with the control plane
  • Scales from 1 to hundreds of nodes

Your Storage

Data stays in Delta Lake format on storage you already own and control. No lock-in.

  • Azure Blob Storage (ADLS Gen2)
  • AWS S3 & S3-compatible (MinIO)
  • Google Cloud Storage
  • Local filesystem & NAS
  • Open Delta Lake format (Parquet files)

Deployment Scenarios

See how Delta Forge fits into real infrastructure

YOUR PRIVATE NETWORK USERS & TOOLS Desktop GUI VS Code KUBERNETES CLUSTER (OR DOCKER COMPOSE) Control Plane Catalog & Security User Management Credential Vault Event Streaming Worker 1 SQL Engine Query Execution PostgreSQL Wire Arrow Flight SQL Worker N SQL Engine Query Execution PostgreSQL Wire Arrow Flight SQL ... SSE Local NAS / SAN / MinIO (S3-compatible) Delta Lake tables stored as Parquet files

Private Data Centre

Run Delta Forge entirely within your network perimeter. Data never leaves your infrastructure. Deploy on bare-metal servers, VMware, or an on-premise Kubernetes cluster.

Runs on Kubernetes or Docker Compose

Two container images: control plane and worker. Orchestrate with Helm charts or a simple compose file.

Scale-out with auto-scaling

Kubernetes HPA scales workers based on CPU and memory. KEDA support for scale-to-zero when idle.

MinIO for S3-compatible storage

Use MinIO, Ceph, or any NAS mount. Delta Lake tables are standard Parquet — no proprietary formats.

Automated setup

Bootstraps its own PostgreSQL catalog on first run. No manual database provisioning or configuration required.

AZURE SUBSCRIPTION USERS & TOOLS Desktop GUI VS Code AZURE KUBERNETES SERVICE (AKS) Control Plane Catalog & Security User Management Credential Vault Event Streaming Workers (HPA) SQL Engine Query Execution Auto-scales 2–10 pods Scale-to-zero (KEDA) Azure Key Vault Secrets & Credentials Entra ID Managed Identity / SSO Azure Data Lake Storage Gen2 abfss:// — Delta Lake tables as Parquet External Sources SQL Server / PostgreSQL / APIs

Azure Cloud

Deploy on Azure Kubernetes Service with native integration into ADLS Gen2, Azure Key Vault, and Entra ID. Scale compute independently from storage.

Azure Data Lake Storage Gen2

Store Delta Lake tables on ADLS Gen2 with hierarchical namespace. Native abfss:// protocol support.

AKS with auto-scaling

Workers auto-scale based on CPU/memory demand. KEDA integration enables scale-to-zero for cost savings.

Azure Key Vault integration

Storage credentials and secrets managed via Key Vault. Rotate keys without redeploying workers.

Entra ID single sign-on

Authenticate users via Managed Identity or service principals. Map Azure AD groups to Delta Forge roles.

Connects to Everything You Already Use

Delta Forge extends PostgreSQL with powerful new commands, and provides purpose-built interfaces to use them

Delta Forge Interfaces

The Delta Forge GUI, VS Code extension, and CLI — built for extended SQL including PIPELINE, VACUUM, OPTIMIZE, and time travel.

BI & Analytics

Power BI, Tableau, Looker, Metabase connect via PostgreSQL or Arrow Flight SQL.

Data Sources

Federate queries to PostgreSQL, MySQL, SQL Server, plus CSV, JSON, Parquet, Excel.

Industry Formats

Native parsing of HL7 (healthcare), FHIR, and EDI (X12, EDIFACT, TRADACOMS).

Enterprise Security Built In

Governance and compliance are not add-ons — they are part of every query

Role-Based Access Control

Granular RBAC with role inheritance, row-level security filters, and column-level masking. Enforce who sees what at the query engine level.

GDPR Pseudonymisation

Built-in pseudonymisation engine with keyed hashing, AES encryption, and redaction transforms. Deterministic output supports joins across datasets.

Audit & Compliance

Every query, credential access, and permission change is logged. Full audit trail for SOC 2, HIPAA, and GDPR compliance requirements.

Why This Architecture Matters

What sets Delta Forge apart from legacy data platforms

No Spark. No Hadoop.

Purpose-built on Apache Arrow. A single worker binary replaces an entire Spark cluster — dramatically reducing infrastructure cost and operational complexity.

Open Format, Zero Lock-in

All data is stored as standard Delta Lake tables (Parquet files + JSON transaction log). Compatible with Databricks, Apache Spark, Trino, and any tool that reads Delta Lake.

PostgreSQL Compatible

Workers are accessed through the Delta Forge Desktop GUI, VS Code extension, or CLI. Purpose-built tools designed for lakehouse workflows.

Desktop to Data Centre

The same engine runs as a desktop application for development and as a distributed cluster for production. One platform from prototyping to scale.

Lightweight Footprint

Control plane needs just 512 MB RAM with an embedded SQLite catalog. No external PostgreSQL, Redis, or ZooKeeper. Workers start in seconds and scale to zero when idle.

Separation of Compute & Storage

Compute workers are stateless — scale them independently from storage. Pay for compute only when queries are running. Storage costs stay predictable.

See it running in your environment

Get a guided deployment in your private data centre or Azure subscription.