Get Started

Download TabbyDB. Free Trial.

Trial valid for approximately 3 months. Maximum 8 executors. No code changes required. Full rollback in 30 seconds.

Download TabbyDB as a complete Spark installation. 100% compatible with the corresponding Apache Spark version.

Iceberg Performance · TabbyDB 4.1.1

Apache Iceberg Runtime — Drop-in Replacement

The iceberg-tabbydb-runtime jar unlocks Broadcast Hash Join key pushdown at the scan level when querying Iceberg tables — unavailable in the standard iceberg-spark-runtime. On a single-node M4 Mac, 50GB non-partitioned Iceberg testing shows 50%+ improvement (1608s → 862s) for queries sorted on the date column. Larger benchmarks at 1 TB and 2 TB are underway.

•Already included in the Fresh Install (.tgz) of TabbyDB 4.1.1 — no action needed.
•For the Convert Existing Spark path: download the jar below and replace your existing iceberg-spark-runtime-4.1.1 jar with it.
•The default iceberg-spark-runtime jar remains 100% compatible with TabbyDB 4.1.1 — you just won't get the Broadcast Hash Join scan-level pushdown gains without this drop-in replacement.

TPC-DS Benchmark · Toolkit

Run TPC-DS Benchmarks Locally

Everything needed to generate TPC-DS data, create tables, and run benchmarks on both stock Spark 4.1.1 and TabbyDB on a single machine. See the FAQ walkthrough for step-by-step instructions.

TPC-DS Kit Setup Patch

Apply this if make fails while building the databricks/tpcds-kit tool. Run the following from the tpcds-kit checkout directory (not the tools/ subdirectory):

patch -p1 < tpcds-toolkit-patch.txt

tpcds-toolkit-patch.txt ↓

spark-sql-perf Helper Jar

Pre-built for Spark 4.1.1 with non-partitioned, date-sorted table support built in. Use the jar directly, or build from the databricks/spark-sql-perf source by applying the patch and running the build from the checkout directory:

patch -p1 < non-partitioned-tables.patch
sbt clean package -J-Djava.security.manager=allow

The built jar will be at target/scala-2.13/spark-sql-perf_2.13-0.5.1-SNAPSHOT.jar. Pass this path to --jars when launching spark-shell.

non-partitioned-tables.patch ↓

Spark Configuration

Tuned spark-defaults.conf and spark-env.sh for a single-node cluster. Drop into $SPARK_HOME/conf/ and adjust executor counts to match your hardware.

spark-defaults.conf ↓spark-env.sh ↓

Benchmark Scripts

Copy-paste directly into spark-shell. Set the path variables at the top of each script before running.

generate-data.scala ↓run-benchmark.scala ↓only-create-tables-from-existing-data.scala ↓

•Once-only application of expensive optimizer rules to complex repeated sub-expressions regardless of batch iterations (SPARK-36786)
•Improved Broadcast Hash Join key pushdown performance at scan level (SPARK-44662)
•New iceberg-tabbydb-runtime-4.1.1 jar — a drop-in replacement for the standard iceberg-spark-runtime that extends Broadcast Hash Join key pushdown to Iceberg table scans, enabling deep file-level pruning before data is read. Download it from the Iceberg Runtime section below.

Compare Performance in Real Time

Run the same query on stock Spark vs TabbyDB in our hosted Zeppelin notebooks.

Demo: Constraints Impact →Demo: Tree Size Impact →