Skip to content
Table Format

Complete Delta Lake Protocol Implementation

The most complete native Delta Lake engine. Full protocol support with ACID transactions, deletion vectors, change data capture, and predictive optimization. No managed runtime required.

Full Reader V1-V3 & Writer V2-V7
UniForm Iceberg interoperability
Zero Spark dependency
TRANSACTION LOG (_delta_log/) 000.json 001.json 002.json checkpoint.parquet PARQUET DATA FILES part-0001.parquet 128 MB part-0002.parquet 256 MB part-0003.parquet 192 MB part-0004.parquet 210 MB part-0005.parquet 180 MB part-0006.parquet 220 MB part-0007.parquet 145 MB part-0008.parquet 200 MB DELETION VECTORS dv-0002 dv-0005 dv-0007 COLUMN STATISTICS min/max nullCount histogram ACID TIME TRAVEL SCHEMA EVOLUTION CLUSTERING

Delta Protocol Support

Complete implementation of Delta Lake protocol specifications

Reader V1

Basic Reader Features

  • Column mapping (id, name)
  • Add/Remove file actions
  • Partition pruning
  • Schema evolution
Reader V2

Advanced Reader Features

  • Deletion vectors
  • Column invariants
  • Generated columns
  • Identity columns
Reader V3

Latest Reader Features

  • Timestamp without timezone
  • V2 checkpoints
  • Vacuum protocol check
  • Type widening
Writer V2-V7

Full Writer Support

  • Append-only tables
  • Change Data Feed
  • Invariants & constraints
  • Clustering & Z-order

Deletion Vectors

Surgical row-level deletes without file rewrites

Deletion vectors represent a fundamental advancement in lakehouse architecture. Instead of rewriting entire Parquet files to delete rows, Delta Forge tracks deleted row positions in compact bitmap structures.

How It Works

  1. DELETE Statement - Identify rows matching predicate
  2. Bitmap Creation - Record row positions in a compact bitmap
  3. DV File Write - Store compact deletion vector file
  4. Transaction Log - Link DV to original data file
  5. Read Filtering - Apply DV during scan to skip deleted rows

Performance Impact

  • 10-100x faster deletes vs file rewrite
  • Minimal storage overhead - DVs are highly compressed
  • Instant commits - No Parquet file I/O for delete
  • Efficient MERGE - Delete side uses DVs automatically
Parquet File
Row 0 Row 1 ✕ Row 2 Row 3 ✕ Row 4 Row 5 Row 6 ✕ Row 7
Deletion Vector
Bitmap {1, 3, 6} ~12 bytes

Time Travel

Query any historical state of your data

Version-Based Access

  • Query specific version numbers
  • Compare data between versions
  • Restore to previous versions
  • Clone tables at any version
SELECT * FROM events VERSION AS OF 42

Timestamp-Based Access

  • Query data as of specific timestamp
  • Point-in-time recovery
  • Audit trail reconstruction
  • Compliance reporting
SELECT * FROM events TIMESTAMP AS OF '2024-01-15 10:30:00'

Transaction Log

  • Complete operation history
  • User and application tracking
  • Operation metrics
  • Schema evolution history
DESCRIBE HISTORY events

Restore Operations

  • Instant restore to any version
  • Selective table restore
  • Schema-aware restore
  • Restore with constraints
RESTORE TABLE events TO VERSION AS OF 100

Schema Evolution

Evolve your schema without breaking pipelines

Add Columns

Add new columns at any position. Existing files return NULL for new columns.

ALTER TABLE t ADD COLUMN new_col STRING

Rename Columns

Rename columns while preserving column IDs. Zero data movement.

ALTER TABLE t RENAME COLUMN old_name TO new_name

Drop Columns

Remove columns from schema. Data remains until compaction.

ALTER TABLE t DROP COLUMN deprecated_col

Change Types

Widen column types (int → bigint, float → double).

ALTER TABLE t ALTER COLUMN amount TYPE DECIMAL(20,4)

Reorder Columns

Change column order for better organization.

ALTER TABLE t ALTER COLUMN col FIRST

Nested Evolution

Evolve struct and map types. Add fields to nested structures.

ALTER TABLE t ADD COLUMN address STRUCT<zip: STRING, city: STRING>

Change Data Feed

Track every row-level change for downstream processing

Change Types

  • INSERT - New rows added
  • UPDATE_PREIMAGE - Row before update
  • UPDATE_POSTIMAGE - Row after update
  • DELETE - Rows removed

Change Metadata

  • _change_type - Type of change
  • _commit_version - Transaction version
  • _commit_timestamp - When change occurred

Query Changes

  • Version range queries
  • Timestamp range queries
  • Incremental processing
  • Streaming consumption

Use Cases

  • ETL pipeline triggers
  • Real-time analytics sync
  • Audit trail generation
  • Cache invalidation
Change Data Feed Query
-- Get all changes since version 100
SELECT * FROM table_changes('customers', 100, 150)
WHERE _change_type IN ('UPDATE_POSTIMAGE', 'INSERT');

-- Get changes in time range
SELECT customer_id, email, _change_type, _commit_timestamp
FROM table_changes('customers',
    '2024-01-01 00:00:00',
    '2024-01-31 23:59:59')
ORDER BY _commit_timestamp;

UniForm: Delta + Iceberg Interoperability

Write once as Delta. Read anywhere as Iceberg. No data duplication.

Enable UniForm compatibility and Delta Forge automatically generates Iceberg metadata alongside the Delta transaction log. The same physical data files are readable by both Delta and Iceberg clients. No ETL pipeline to maintain a second copy, no storage duplication, no synchronization overhead.

How UniForm Works

  • Single Write Path - Data is written once as Delta Parquet files
  • Automatic Metadata - Iceberg manifest and metadata files generated on commit
  • Zero Data Duplication - Both formats point to the same physical files
  • Full Version Support - Compatible with Iceberg format V1, V2, and V3

Cross-Engine Access

Any Iceberg-compatible query engine can read your Delta tables directly. Use Delta Forge for writes and maintenance, while downstream consumers use whichever engine fits their workflow.

ALTER TABLE events SET TBLPROPERTIES ('delta.universalFormat.enabledFormats' = 'iceberg')

Learn more about Apache Iceberg support →

Delta Table _delta_log/ UniForm Metadata Translation Layer Iceberg metadata/ Shared Parquet Data Files Same physical files. No duplication.

Variant Type for Semi-Structured Data

JSON-like flexibility with columnar performance

Store semi-structured data natively in Delta tables using the Variant type. Unlike raw JSON strings, Variant uses an efficient binary encoding with automatic shredding into columnar storage, delivering up to 10x better query performance while preserving full schema flexibility.

Key Capabilities

  • Automatic Shredding - Frequently accessed fields are extracted into columnar storage for fast reads
  • Path-Based Extraction - Access nested fields without parsing the entire document
  • Zero Precision Loss - Numeric types preserve exact values in binary encoding
  • Schema Discovery - Automatic inference of structure from semi-structured data

Use Cases

  • Event streams with varying payloads
  • API response archival with evolving schemas
  • IoT sensor data with heterogeneous formats
  • Log aggregation across diverse sources
Variant Type Queries
-- Create table with Variant column
CREATE TABLE events (
  id BIGINT,
  event_date DATE,
  payload VARIANT
);

-- Path-based field extraction
SELECT
  payload:user.name AS user_name,
  payload:user.email AS email,
  payload:action AS action_type
FROM events
WHERE payload:user.region = 'us-east';

-- Automatic shredding means this runs
-- at columnar speed, not JSON parsing speed

GDPR Data Erasure

Right-to-be-forgotten compliance built into the storage layer

Delta Lake's combination of targeted DELETE, deletion vectors, and VACUUM gives you a complete, auditable pipeline for GDPR right-to-be-forgotten requests. Delete the logical record, then physically remove the underlying files so the data is cryptographically unrecoverable.

Compliance Workflow

  1. Targeted DELETE - Remove specific user data by predicate without rewriting unrelated files
  2. Deletion Vector - Rows are immediately invisible to all readers via bitmap
  3. DRY RUN - Preview which files will be physically removed before committing
  4. VACUUM - Permanently erase the old Parquet files containing the deleted data

Why Delta Lake

  • Surgical precision - delete one user without touching millions of unrelated rows
  • Cryptographic proof - once VACUUM completes, old files no longer exist on storage
  • Audit trail - the transaction log records exactly when the deletion occurred
  • No downtime - concurrent readers continue without interruption
-- GDPR: Right to be forgotten
DELETE FROM customers
WHERE customer_id = 'user-12345';

-- Preview files that will be removed
VACUUM customers RETAIN 0 HOURS DRY RUN;

-- Permanently remove old data files
VACUUM customers RETAIN 0 HOURS;

Constraints & Data Quality

Enforce data quality at the storage layer

NOT NULL Constraints

Prevent null values in critical columns.

ALTER TABLE t ALTER COLUMN id SET NOT NULL

CHECK Constraints

Custom validation expressions.

ALTER TABLE t ADD CONSTRAINT valid_price CHECK (price > 0)

Generated Columns

Auto-computed columns from expressions.

year INT GENERATED ALWAYS AS (YEAR(event_date))

Identity Columns

Auto-incrementing unique identifiers.

id BIGINT GENERATED ALWAYS AS IDENTITY

Default Values

Automatic value assignment.

created_at TIMESTAMP DEFAULT current_timestamp()

Column Invariants

Validation enforced on write.

status STRING CHECK (status IN ('active', 'inactive'))

Table Maintenance Operations

Keep tables optimized and performant

🔧

OPTIMIZE

Compact small files into larger ones. Improves read performance by reducing file count.

OPTIMIZE events WHERE date > '2024-01-01'
  • Bin-packing algorithm
  • Target file size: 1GB
  • Predicate-based scoping
🗑️

VACUUM

Remove old files no longer referenced by any version. Reclaim storage space.

VACUUM events RETAIN 168 HOURS
  • Safe retention period
  • Dry-run mode available
  • Respects time travel
📊

ANALYZE

Compute column statistics for query optimization. Updates histograms and NDV estimates.

ANALYZE TABLE events COMPUTE STATISTICS FOR ALL COLUMNS
  • Column-level stats
  • Histogram generation
  • Incremental updates
🔄

Z-ORDER

Co-locate related data for better data skipping. Optimizes multi-dimensional queries.

OPTIMIZE events ZORDER BY (user_id, event_type)
  • Space-filling curves
  • Multi-column optimization
  • Improved file pruning

Proactive Table Intelligence

Preventive monitoring and actionable recommendations, not reactive debugging

Health Score

  • Single 0-100 health score per table
  • Specific issue identification
  • File size distribution analysis
  • Deletion vector density tracking
  • Clustering quality assessment

Audit & Integrity

  • Detect corruption early
  • Find orphaned files
  • Missing checkpoint detection
  • Protocol violation alerts
  • On-demand or automated checks

Storage Analytics

  • Storage breakdown by file type
  • Efficiency metrics and trends
  • Small file ratio monitoring
  • DV overhead tracking
  • Cost attribution per table

Recommendations

  • Auto-generated optimization suggestions
  • Priority ranking by estimated benefit
  • Timeline analysis of table evolution
  • Write volume and pattern insights
  • Expected improvement estimates

Most platforms leave table health monitoring to you. Delta Forge monitors continuously and tells you what needs attention before it becomes a problem.

Predictive Optimization

Automatic, workload-aware maintenance scheduling with zero manual tuning

Predictive Optimization analyzes table activity patterns and automatically schedules maintenance operations at the right time. The system estimates the benefit of each operation and prioritizes accordingly. Your tables stay healthy without manual intervention.

Automatic Triggers

  • Auto-VACUUM - Schedules cleanup based on file accumulation rate and retention policy
  • Auto-OPTIMIZE - Triggers compaction when small file ratio exceeds thresholds
  • Auto-ANALYZE - Refreshes statistics after significant data changes
  • Auto-Cluster - Re-clusters data when new writes degrade layout quality

Workload-Aware Scheduling

  • Activity Monitoring - Tracks write volume, query patterns, and access frequency
  • Benefit Estimation - Predicts performance improvement before running operations
  • Priority Ranking - Most impactful operations run first
  • Resource Budgeting - Maintenance respects configurable resource limits
Predictive Optimization
-- Enable predictive optimization
ALTER TABLE events SET TBLPROPERTIES (
  'delta.enablePredictiveOptimization' = 'true'
);

-- The system automatically:
--   Monitors write patterns
--   Triggers OPTIMIZE when small files accumulate
--   Runs VACUUM after retention window passes
--   Refreshes ANALYZE after significant changes
--   Re-clusters when layout quality degrades

-- Check optimization status
DESCRIBE DETAIL events;

Enterprise-grade Delta Lake. Native performance. Zero Spark dependency.

Full protocol support. ACID transactions. Predictive optimization. Production-tested and enterprise-ready.