Graph Analytics - Delta Forge

SQL + Cypher: Best of Both Worlds

Write graph patterns in Cypher, combine results with SQL - all in one query

Cypher Pattern Graph traversal with familiar syntax

MATCH (p:Person)-[:FOLLOWS]->(friend:Person)
      -[:PURCHASED]->(product:Product)
WHERE p.name = 'Alice'
RETURN friend.name, product.name, product.price

SQL + Cypher Combined Join graph results with relational data

SELECT c.friend_name, c.product_name,
       i.stock_level, s.supplier_name
FROM cypher('
    MATCH (p:Person)-[:FOLLOWS]->(f:Person)
          -[:PURCHASED]->(prod:Product)
    WHERE p.name = $1
    RETURN f.name AS friend_name,
           prod.id AS product_id,
           prod.name AS product_name
', 'Alice') AS c
JOIN inventory i ON i.product_id = c.product_id
JOIN suppliers s ON s.id = i.supplier_id
WHERE i.stock_level > 0;

Cypher Query Support

Full Cypher pattern matching
Variable-length paths [:KNOWS*1..5]
Named paths and path functions
Aggregations within Cypher

SQL Integration

Cypher results as table expressions
JOIN with any Delta/external table
Use in subqueries and CTEs
Full SQL filtering and aggregation

Parameter Binding

Parameterized Cypher queries
Prevent injection attacks
Query plan caching
Prepared statement support

Why It Matters

No ETL to separate graph DB
One query language to learn
Consistent transaction semantics
Unified security model

Centrality Algorithms

Identify the most important nodes in your graph

PageRank

Google's algorithm for ranking node importance based on link structure. Ideal for influence analysis and recommendation systems.

Iterative Damping Factor Convergence

Betweenness Centrality

Measures how often a node lies on shortest paths between other nodes. Identifies critical bridges and bottlenecks.

Brandes Algorithm Sampling

Closeness Centrality

Measures average distance from a node to all other nodes. Finds nodes that can quickly reach the entire network.

Harmonic Wasserman-Faust

Degree Centrality

Simple count of connections per node. Supports in-degree, out-degree, and total degree for directed graphs.

In-Degree Out-Degree

Eigenvector Centrality

Assigns importance based on connections to other important nodes. Foundation of PageRank algorithm.

Power Iteration Normalized

HITS (Hubs & Authorities)

Computes hub and authority scores. Hubs point to authorities; authorities are pointed to by hubs.

Hub Score Authority Score

Community Detection

Discover clusters and communities within your data

Louvain Algorithm

Fast modularity optimization for large-scale community detection. Hierarchical clustering with configurable resolution.

Modularity Hierarchical Resolution

Label Propagation

Semi-supervised community assignment based on neighbor labels. Fast and scalable for massive graphs.

Semi-Supervised Linear Time

Connected Components

Find weakly and strongly connected components. Essential preprocessing for many graph algorithms.

Weakly Connected Strongly Connected

Triangle Counting

Count triangles for clustering coefficient calculation. Measures local density and transitivity.

Clustering Coefficient Transitivity

K-Core Decomposition

Find cohesive subgraphs where each node has at least k neighbors. Identifies dense core structures.

Coreness Degeneracy

Modularity Scoring

Measure quality of community partitions. Compare different clustering results objectively.

Quality Metric Partition Comparison

Path Finding Algorithms

Discover routes and relationships between nodes

Shortest Path (Dijkstra)

Find shortest weighted path between two nodes. Single-source shortest path to all destinations.

Weighted Single Source

All Pairs Shortest Path

Compute shortest paths between all node pairs. Floyd-Warshall and Johnson's algorithms.

Floyd-Warshall Distance Matrix

Breadth-First Search

Level-by-level graph traversal. Find nodes within n hops, compute hop distances.

Unweighted Hop Count

A* Search

Heuristic-guided pathfinding for faster results. Optimal for geospatial and game graphs.

Heuristic Geospatial

Minimum Spanning Tree

Find lowest-cost tree connecting all nodes. Prim's and Kruskal's algorithms available.

Prim Kruskal

Random Walk

Simulate random traversals through the graph. Foundation for node2vec embeddings.

Sampling Embeddings

SQL Table Functions

Use graph algorithms directly in SQL queries

graph_pagerank()

Compute PageRank scores with configurable damping and iterations

graph_shortest_path()

Find shortest path between any two nodes

graph_louvain()

Detect communities using Louvain algorithm

graph_connected_components()

Find all connected components in the graph

graph_betweenness()

Calculate betweenness centrality scores

graph_triangles()

Count triangles and compute clustering coefficients

graph_neighbors()

Get neighbors within n hops of a node

graph_label_propagation()

Assign community labels via propagation

Advanced Graph Features

Enterprise-grade capabilities for production workloads

Property Graphs

Node and edge properties
Multiple edge types
Typed nodes and relationships
Property filtering in queries

Graph Storage

Delta Lake native storage
Adjacency list representation
Edge list format support
Incremental graph updates

Scalability

Distributed computation
Approximate algorithms for massive graphs
Sampling for large-scale analytics
Memory-efficient processing

Graph Metrics

Graph density calculation
Average path length
Diameter computation
Degree distribution analysis

Graph Mutations

Create, update, and delete nodes and edges directly via SQL

Create Nodes & Edges Build your graph with familiar INSERT syntax

-- Create nodes
INSERT INTO GRAPH social_network NODE person
VALUES ('Alice', 30);

INSERT INTO GRAPH social_network NODE person
VALUES ('Bob', 28);

-- Create an edge between them
INSERT INTO GRAPH social_network EDGE knows
VALUES ('Alice', 'Bob', '2024-01-15');

Update & Delete Modify properties or remove elements

-- Update a node property
UPDATE GRAPH social_network NODE person
SET age = 31
WHERE name = 'Alice';

-- Delete a relationship
DELETE FROM GRAPH social_network EDGE knows
WHERE src = 'Alice' AND dst = 'Bob';

-- Delete a node (cascades edges)
DELETE FROM GRAPH social_network NODE person
WHERE name = 'Bob';

Node Operations

INSERT nodes with typed properties
UPDATE properties with SET clauses
DELETE nodes with cascading edge removal
Bulk insert from SELECT queries

Edge Operations

INSERT edges between existing nodes
UPDATE edge properties and weights
DELETE edges with filtered conditions
Upsert semantics with ON CONFLICT

Transactional Safety

ACID transactions on all mutations
Referential integrity enforcement
Atomic multi-element operations
Delta Lake time travel on graph state

Batch Mutations

Bulk load from CSV and Parquet
INSERT INTO GRAPH ... SELECT ...
Multi-row VALUES clauses
Streaming ingestion support

Graph Storage Modes

Choose the storage strategy that fits your graph workload

Flattened

Nodes and edges stored as flat columns in standard Delta tables. Best for simple graphs with fixed schemas where fast columnar scans matter most.

Fast Scans Simple Schemas Columnar

Hybrid

Structured edge columns with property maps for flexible attributes. Balances query performance with schema flexibility for evolving graphs.

Property Maps Balanced Evolving Schemas

JSON

Full graph structure stored in JSON columns. Maximum flexibility for complex, deeply nested schemas and heterogeneous property types.

Flexible Nested Schemas Heterogeneous

When to Use Each Mode

Flattened - fixed-schema graphs, analytics-heavy workloads, maximum scan throughput
Hybrid - graphs with optional properties, mix of structured and semi-structured data
JSON - highly variable schemas, rapid prototyping, complex nested relationships
All modes store data as Delta Lake tables with full versioning and time travel

Real-World Benchmark Datasets

Built-in graph datasets for testing and benchmarking

Karate Club

Zachary's classic social network - 34 nodes, ideal for community detection validation

EU Email

Email communication network from a European research institution - thousands of nodes and edges

NetScience

Co-authorship network of scientists working on network theory - real collaboration patterns

PolBooks

Political books co-purchasing network - classic benchmark for partitioning algorithms

LDBC SNB

Linked Data Benchmark Council Social Network - industry-standard graph benchmark at scale

Any Delta Table.Instant Graph.

Any Delta Table Becomes a Graph