Skip to content
Graph Analytics

Any Delta Table.
Instant Graph.

Transform any Delta Lake table into a graph with simple configuration. Run PageRank, community detection, and pathfinding algorithms directly via SQL.

Mix SQL with Cypher in one query
No separate graph database needed
Results join directly with SQL tables
orders customer_id product_id qty C-101 P-501 3 C-102 P-501 1 C-101 P-502 2 C-103 P-502 5 Delta Lake Table CONFIG source: customer_id target: product_id Customer-Product Graph C-101 PR: 0.42 C-102 PR: 0.21 C-103 PR: 0.15 P-501 PR: 0.58 P-502 PR: 0.36 Graph with PageRank scores SQL Query SELECT * FROM graph_pagerank ( 'orders' , 'customer_id' , 'product_id' );

Any Delta Table Becomes a Graph

Simply specify which columns represent source and target nodes. Delta Forge builds an optimized CSR (Compressed Sparse Row) graph structure in memory, enabling sub-second algorithm execution on millions of edges.

graph_pagerank('table', 'source_col', 'target_col', damping := 0.85)

SQL + Cypher: Best of Both Worlds

Write graph patterns in Cypher, combine results with SQL - all in one query

Cypher Pattern Graph traversal with familiar syntax
MATCH (p:Person)-[:FOLLOWS]->(friend:Person)
      -[:PURCHASED]->(product:Product)
WHERE p.name = 'Alice'
RETURN friend.name, product.name, product.price
SQL + Cypher Combined Join graph results with relational data
SELECT c.friend_name, c.product_name,
       i.stock_level, s.supplier_name
FROM cypher('
    MATCH (p:Person)-[:FOLLOWS]->(f:Person)
          -[:PURCHASED]->(prod:Product)
    WHERE p.name = $1
    RETURN f.name AS friend_name,
           prod.id AS product_id,
           prod.name AS product_name
', 'Alice') AS c
JOIN inventory i ON i.product_id = c.product_id
JOIN suppliers s ON s.id = i.supplier_id
WHERE i.stock_level > 0;

Cypher Query Support

  • Full Cypher pattern matching
  • Variable-length paths [:KNOWS*1..5]
  • Named paths and path functions
  • Aggregations within Cypher

SQL Integration

  • Cypher results as table expressions
  • JOIN with any Delta/external table
  • Use in subqueries and CTEs
  • Full SQL filtering and aggregation

Parameter Binding

  • Parameterized Cypher queries
  • Prevent injection attacks
  • Query plan caching
  • Prepared statement support

Why It Matters

  • No ETL to separate graph DB
  • One query language to learn
  • Consistent transaction semantics
  • Unified security model

Centrality Algorithms

Identify the most important nodes in your graph

PageRank

Google's algorithm for ranking node importance based on link structure. Ideal for influence analysis and recommendation systems.

Iterative Damping Factor Convergence

Betweenness Centrality

Measures how often a node lies on shortest paths between other nodes. Identifies critical bridges and bottlenecks.

Brandes Algorithm Sampling

Closeness Centrality

Measures average distance from a node to all other nodes. Finds nodes that can quickly reach the entire network.

Harmonic Wasserman-Faust

Degree Centrality

Simple count of connections per node. Supports in-degree, out-degree, and total degree for directed graphs.

In-Degree Out-Degree

Eigenvector Centrality

Assigns importance based on connections to other important nodes. Foundation of PageRank algorithm.

Power Iteration Normalized

HITS (Hubs & Authorities)

Computes hub and authority scores. Hubs point to authorities; authorities are pointed to by hubs.

Hub Score Authority Score

Community Detection

Discover clusters and communities within your data

Community A Community B

Louvain Algorithm

Fast modularity optimization for large-scale community detection. Hierarchical clustering with configurable resolution.

Modularity Hierarchical Resolution

Label Propagation

Semi-supervised community assignment based on neighbor labels. Fast and scalable for massive graphs.

Semi-Supervised Linear Time

Connected Components

Find weakly and strongly connected components. Essential preprocessing for many graph algorithms.

Weakly Connected Strongly Connected

Triangle Counting

Count triangles for clustering coefficient calculation. Measures local density and transitivity.

Clustering Coefficient Transitivity

K-Core Decomposition

Find cohesive subgraphs where each node has at least k neighbors. Identifies dense core structures.

Coreness Degeneracy

Modularity Scoring

Measure quality of community partitions. Compare different clustering results objectively.

Quality Metric Partition Comparison

Path Finding Algorithms

Discover routes and relationships between nodes

Shortest Path (Dijkstra)

Find shortest weighted path between two nodes. Single-source shortest path to all destinations.

Weighted Single Source

All Pairs Shortest Path

Compute shortest paths between all node pairs. Floyd-Warshall and Johnson's algorithms.

Floyd-Warshall Distance Matrix

Breadth-First Search

Level-by-level graph traversal. Find nodes within n hops, compute hop distances.

Unweighted Hop Count

A* Search

Heuristic-guided pathfinding for faster results. Optimal for geospatial and game graphs.

Heuristic Geospatial

Minimum Spanning Tree

Find lowest-cost tree connecting all nodes. Prim's and Kruskal's algorithms available.

Prim Kruskal

Random Walk

Simulate random traversals through the graph. Foundation for node2vec embeddings.

Sampling Embeddings

SQL Table Functions

Use graph algorithms directly in SQL queries

graph_pagerank()

Compute PageRank scores with configurable damping and iterations

graph_shortest_path()

Find shortest path between any two nodes

graph_louvain()

Detect communities using Louvain algorithm

graph_connected_components()

Find all connected components in the graph

graph_betweenness()

Calculate betweenness centrality scores

graph_triangles()

Count triangles and compute clustering coefficients

graph_neighbors()

Get neighbors within n hops of a node

graph_label_propagation()

Assign community labels via propagation

Advanced Graph Features

Enterprise-grade capabilities for production workloads

Property Graphs

  • Node and edge properties
  • Multiple edge types
  • Typed nodes and relationships
  • Property filtering in queries

Graph Storage

  • Delta Lake native storage
  • Adjacency list representation
  • Edge list format support
  • Incremental graph updates

Scalability

  • Distributed computation
  • Approximate algorithms for massive graphs
  • Sampling for large-scale analytics
  • Memory-efficient processing

Graph Metrics

  • Graph density calculation
  • Average path length
  • Diameter computation
  • Degree distribution analysis

Graph Mutations

Create, update, and delete nodes and edges directly via SQL

Create Nodes & Edges Build your graph with familiar INSERT syntax
-- Create nodes
INSERT INTO GRAPH social_network NODE person
VALUES ('Alice', 30);

INSERT INTO GRAPH social_network NODE person
VALUES ('Bob', 28);

-- Create an edge between them
INSERT INTO GRAPH social_network EDGE knows
VALUES ('Alice', 'Bob', '2024-01-15');
Update & Delete Modify properties or remove elements
-- Update a node property
UPDATE GRAPH social_network NODE person
SET age = 31
WHERE name = 'Alice';

-- Delete a relationship
DELETE FROM GRAPH social_network EDGE knows
WHERE src = 'Alice' AND dst = 'Bob';

-- Delete a node (cascades edges)
DELETE FROM GRAPH social_network NODE person
WHERE name = 'Bob';

Node Operations

  • INSERT nodes with typed properties
  • UPDATE properties with SET clauses
  • DELETE nodes with cascading edge removal
  • Bulk insert from SELECT queries

Edge Operations

  • INSERT edges between existing nodes
  • UPDATE edge properties and weights
  • DELETE edges with filtered conditions
  • Upsert semantics with ON CONFLICT

Transactional Safety

  • ACID transactions on all mutations
  • Referential integrity enforcement
  • Atomic multi-element operations
  • Delta Lake time travel on graph state

Batch Mutations

  • Bulk load from CSV and Parquet
  • INSERT INTO GRAPH ... SELECT ...
  • Multi-row VALUES clauses
  • Streaming ingestion support

Graph Storage Modes

Choose the storage strategy that fits your graph workload

Flattened

Nodes and edges stored as flat columns in standard Delta tables. Best for simple graphs with fixed schemas where fast columnar scans matter most.

Fast Scans Simple Schemas Columnar

Hybrid

Structured edge columns with property maps for flexible attributes. Balances query performance with schema flexibility for evolving graphs.

Property Maps Balanced Evolving Schemas

JSON

Full graph structure stored in JSON columns. Maximum flexibility for complex, deeply nested schemas and heterogeneous property types.

Flexible Nested Schemas Heterogeneous

When to Use Each Mode

  • Flattened - fixed-schema graphs, analytics-heavy workloads, maximum scan throughput
  • Hybrid - graphs with optional properties, mix of structured and semi-structured data
  • JSON - highly variable schemas, rapid prototyping, complex nested relationships
  • All modes store data as Delta Lake tables with full versioning and time travel

Real-World Benchmark Datasets

Built-in graph datasets for testing and benchmarking

Karate Club

Zachary's classic social network - 34 nodes, ideal for community detection validation

EU Email

Email communication network from a European research institution - thousands of nodes and edges

NetScience

Co-authorship network of scientists working on network theory - real collaboration patterns

PolBooks

Political books co-purchasing network - classic benchmark for partitioning algorithms

LDBC SNB

Linked Data Benchmark Council Social Network - industry-standard graph benchmark at scale

Unlock insights from connected data

Start analyzing relationships in your data with powerful graph algorithms.