Download TabbyDB. Free Trial.
Trial valid for approximately 3 months. Maximum 8 executors. No code changes required. Full rollback in 30 seconds.
Download TabbyDB as a complete Spark installation. 100% compatible with the corresponding Apache Spark version.
Apache Iceberg Runtime — Drop-in Replacement
The iceberg-tabbydb-runtime jar unlocks Broadcast Hash Join key pushdown at the scan level when querying Iceberg tables — unavailable in the standard iceberg-spark-runtime. On a single-node M4 Mac, 50 GB non-partitioned Iceberg testing shows 46% improvement (1608s → 862s) for queries sorted on the date column. Larger benchmarks at 1 TB and 2 TB are underway.
- •Already included in the Fresh Install (.tgz) of TabbyDB 4.1.1 — no action needed.
- •For the Convert Existing Spark path: download the jar below and replace your existing
iceberg-spark-runtime-4.1.1jar with it. - •The default
iceberg-spark-runtimejar remains 100% compatible with TabbyDB 4.1.1 — you just won't get the Broadcast Hash Join scan-level pushdown gains without this drop-in replacement.
Non-Partitioned, Date-Sorted Splits Patch
If you're running TPC-DS on TabbyDB and seeing similar numbers to stock Spark, the toolkit is generating partitioned tables by default — which hides TabbyDB's gains. Apply this patch to the Databricks TPC-DS toolkit source to generate non-partitioned, locally date-sorted splits. Data generation becomes 6–7× faster and benchmark results show ~13% better performance at 1TB–2TB scale.
- •Once-only application of expensive optimizer rules to complex repeated sub-expressions regardless of batch iterations (SPARK-36786)
- •Improved Broadcast Hash Join key pushdown performance at scan level (SPARK-44662)
Compare Performance in Real Time
Run the same query on stock Spark vs TabbyDB in our hosted Zeppelin notebooks.