Write scalar functions, aggregate functions, table functions, and window functions in your language of choice. Native performance with Python and JavaScript flexibility.
Complete UDF taxonomy for every analytical need
Process one row, return one value. The foundation of data transformation.
SELECT my_transform(column) FROM table
Combine multiple rows into a single result. Build custom aggregations.
SELECT my_agg(value) FROM table GROUP BY key
Compute over sliding windows of rows. Advanced analytics made simple.
SELECT my_window(value) OVER (PARTITION BY key ORDER BY ts)
Generate multiple rows from a single input. Explode, flatten, and transform.
SELECT * FROM table, LATERAL my_udtf(column)
Write UDFs in the language that best fits your use case
Leverage the Python ecosystem. NumPy, Pandas, scikit-learn, and more available in your UDFs.
@udf(return_type=Float64)
def predict_score(features: List[float]) -> float:
return model.predict([features])[0]
Fast iteration with JavaScript. Perfect for business logic and quick transformations.
function formatAddress(street, city, zip) {
return `${street}, ${city} ${zip}`;
}
Define functions using pure SQL expressions. Maximum portability and readability.
CREATE FUNCTION calculate_tax(amount DECIMAL)
RETURNS DECIMAL AS
amount * 0.0825
How scalar functions are compiled and executed
Build stateful aggregations with full parallelism support
Define the accumulator state structure
struct WeightedAvgState { sum: f64, weight_sum: f64 }
Create initial empty state
fn initialize() -> State { State { sum: 0.0, weight_sum: 0.0 } }
Accumulate a single row into state
fn update(state: &mut State, value: f64, weight: f64) { state.sum += value * weight; state.weight_sum += weight; }
Combine two partial states (for parallelism)
fn merge(s1: &mut State, s2: State) { s1.sum += s2.sum; s1.weight_sum += s2.weight_sum; }
Compute final result from accumulated state
fn finalize(state: State) -> f64 { state.sum / state.weight_sum }
use delta_forge_udf::{AggregateUDF, State};
#[derive(AggregateUDF)]
#[udf(name = "weighted_avg", return_type = "Float64")]
struct WeightedAverage;
#[derive(Default, Clone)]
struct WeightedAvgState {
sum: f64,
weight_sum: f64,
}
impl AggregateUDF for WeightedAverage {
type State = WeightedAvgState;
fn update(state: &mut Self::State, value: f64, weight: f64) {
if weight > 0.0 {
state.sum += value * weight;
state.weight_sum += weight;
}
}
fn merge(state: &mut Self::State, other: Self::State) {
state.sum += other.sum;
state.weight_sum += other.weight_sum;
}
fn finalize(state: Self::State) -> Option {
if state.weight_sum > 0.0 {
Some(state.sum / state.weight_sum)
} else {
None
}
}
}
// Usage in SQL:
// SELECT category, weighted_avg(price, quantity) FROM sales GROUP BY category
Generate multiple rows from custom logic
-- Table function that extracts key-value pairs from JSON
CREATE FUNCTION json_entries(json_col JSON)
RETURNS TABLE(key VARCHAR, value VARCHAR, type VARCHAR)
AS $$
for key, value in json_col.items():
yield (key, str(value), type(value).__name__)
$$;
-- Usage: Extract all JSON fields into rows
SELECT t.id, e.key, e.value
FROM events t
CROSS JOIN LATERAL json_entries(t.metadata) AS e
WHERE e.type = 'str';
Techniques for maximum UDF throughput
Process entire Arrow arrays instead of row-by-row. Amortize function call overhead across thousands of values.
fn eval_batch(input: &ArrayRef) -> ArrayRef
Allocate buffers once and reuse across batches. Avoid allocation in hot paths.
fn eval_with_state(&self, state: &mut EvalState, ...)
Skip null values efficiently using validity bitmaps. Propagate nulls without evaluation.
propagate_nulls: true
Write SIMD-optimized inner loops for numeric operations. 8x speedup with AVX-512.
#[target_feature(enable = "avx512f")]
Operate on dictionary indices for low-cardinality string columns. Avoid string comparisons.
fn eval_dict(keys: &UInt32Array, dict: &StringArray)
Short-circuit evaluation for conditional expressions. Skip unnecessary computation.
volatility: Volatile::Stable
Full type inference and coercion support
Arguments automatically coerced to expected types following SQL standard rules.
my_func(INT) called with SMALLINT → automatic upcast
Define functions that work with multiple input types using type variables.
fn first_non_null<T>(a: T, b: T) -> T
Accept variable number of arguments of the same type.
fn concat_all(strings: VARIADIC VARCHAR) -> VARCHAR
Return type computed from input types at planning time.
fn add(a: NUMERIC, b: NUMERIC) -> max_precision(a, b)
Native performance. Python flexibility. Infinite possibilities.