Read from common file formats, relational databases (PostgreSQL, MySQL, SQL Server, Oracle), and cloud services. Write to Delta Lake tables with full schema evolution support.
Turn any nested format into a SQL table — visually configure, automatically flatten
Delta Forge includes a visual schema discovery and configuration tool that transforms complex, nested data formats into flat, queryable SQL tables. One unified experience across six formats: JSON, XML, EDI, HL7, FHIR, and Protobuf.
Scan files and automatically discover all nested paths, types, and sample values
Use an interactive tree view to select which fields to include, exclude, explode, or keep as JSON
Flattened data appears as a standard SQL table. Missing paths become NULL. Configuration persists across queries.
Whitelist specific paths into the output
Remove entirely from output
Create one row per array element (like SQL UNNEST)
Keep subtree as a JSON string column instead of flattening
Automatic flattening behavior — all paths included with standard naming
JSONPath, SIMD-accelerated parsing
XPath-like expressions, attribute handling, namespace support
Segment-based flattening, composite elements
Component flattening, friendly aliases
Resource type discovery, bundle unbundling
Enum decoding, repeated field handling
-- Input (nested JSON):
-- {
-- "user": { "name": "Alice", "contact": { "email": "alice@example.com" } },
-- "tags": ["vip", "active"],
-- "metadata": { "source": "api", "raw": {...} }
-- }
-- Output (flattened SQL table):
-- user_name | user_contact_email | tags | metadata
-- Alice | alice@example.com | ["vip","active"] | (kept as JSON)
SELECT user_name, user_contact_email, tags, metadata
FROM customers; -- flattened table, ready to query
All 6 formats share the same tree view — no format-specific tooling needed
Configuration persists to the table — query results are always consistent
500MB/s+ throughput for JSON processing
Point, click, query — flatten nested data without writing any transformation code
Connect to relational and NoSQL databases with predicate pushdown. All connection credentials are stored securely in OS Keychain or Azure Key Vault, never in config files.
Full-featured connectivity with SSL, connection pooling, and predicate pushdown
MySQL 5.7+ and MariaDB support with binary protocol
Microsoft SQL Server with Windows and Azure AD authentication
Oracle 12c+ with TNS and Easy Connect naming
Document database with aggregation pipeline pushdown
Key-value store with cluster and sentinel support
Native integration with all major cloud providers
Native support for all major data formats with optimized readers
Query Proto3 binary data with SQL — a capability most engines simply don't have
.proto descriptorgoogle.protobuf.Timestamp → Arrow TIMESTAMPgoogle.protobuf.Duration → Arrow INTERVALgoogle.protobuf.StringValue & wrapper typesgoogle.protobuf.Struct as JSON columns-- Read IoT sensor data from Proto3 binary files
SELECT device_id, temperature, humidity, recorded_at
FROM read_protobuf(
'sensors/*.pb',
'sensor.proto',
'SensorReading'
)
WHERE temperature > 35.0
ORDER BY recorded_at DESC;
Production-grade ORC reading for Hive data warehouses — battle-tested across 6 industry demos
STRUCT fields mapped to nested Arrow structsMAP fields as key-value list arraysARRAY fields as Arrow list columnsSchema evolution across files with automatic type promotion and null-filling
int → long, float → doubledate → Arrow DATE32timestamp-millis / timestamp-micros → Arrow TIMESTAMPdecimal with precision and scale preserveduuid, time-millis, time-microsFlexible JSON reading with subtree capture for semi-structured analytics
json_pathsjson_extract functions-- Keep nested 'address' as a JSON blob, extract flat fields normally
SELECT name, email, address
FROM read_json('customers/*.json',
json_paths := '{address}'
);
-- Result: 'address' column contains full JSON objects
-- {"street": "123 Main St", "city": "Denver", "state": "CO", "zip": "80202"}
-- Then query into the captured subtree
SELECT name, json_extract(address, '$.city') AS city
FROM read_json('customers/*.json',
json_paths := '{address}'
);
Structured XML reading with subtree capture and schema evolution
Multi-sheet reading with intelligent header detection and per-sheet type inference
Real-time data ingestion from event streams
High-throughput distributed event streaming with consumer groups
AWS managed streaming with automatic scaling
Azure-native event ingestion at scale
GCP messaging with exactly-once delivery
Automatic type detection across 40+ locales with auto-generated transform views — no manual schema definitions
DD.MM.YYYY, US dates: MM/DD/YYYY1 234 567,891.234.567,89-- Same column, different locales — Delta Forge infers correctly
-- German (de-DE): period groups, comma decimal
order_total: 1.234.567,89 → DECIMAL
order_date: 15.03.2024 → DATE
-- US English (en-US): comma groups, period decimal
order_total: 1,234,567.89 → DECIMAL
order_date: 03/15/2024 → DATE
-- French (fr-FR): space groups, comma decimal
order_total: 1 234 567,89 → DECIMAL
-- Auto-generated transform view based on inference
CREATE VIEW v_orders AS
SELECT
CAST(order_total AS DECIMAL(12,2)) AS order_total,
CAST(order_date AS DATE) AS order_date,
customer_name
FROM raw_orders;
Unify your data from any source into Delta Lake tables.