KwikQuery LLC is a Beaverton, Oregon-based software company focused on solving real performance problems in Apache Spark — specifically the ones that don't show up in standard benchmarks but cause serious pain in production. Our product, TabbyDB, is a high-performance fork of Apache Spark built for organizations running complex analytical workloads where stock Spark struggles: queries that take hours to compile, plans that blow up memory, and nested join queries that underperform despite adequate hardware. TabbyDB addresses these issues at their root, without requiring code rewrites, cluster changes, or the workaround of disabling optimizer rules.

Apache Spark is a powerful engine, but it has well-documented limitations when query complexity scales up. A few of the issues we see repeatedly in real-world environments:

Query compilation time is invisible in the Spark UI. Spark only registers a query after the plan has been submitted, which means hours spent in compilation don't appear in the metrics — leaving teams blind to a major bottleneck.
Complex plans can spend more time planning than executing. Queries generated programmatically via DataFrame APIs, or involving many joins and nested logic, can trigger optimizer rules that grow the query tree exponentially, resulting in compilation times that run from one hour to eight hours or more — and sometimes end in OutOfMemory failures.
Runtime performance of nested join queries is suboptimal. Stock Spark does not push down Broadcast Hash Join keys as dynamic filters for file pruning when joins are on non-partitioned columns, leaving significant runtime performance on the table.

The usual suggested workaround, if any, involves disabling optimizer rules — trades runtime performance for compile-time relief. We don't think that's an acceptable tradeoff, and TabbyDB doesn't require it.

Asif Hussain Shahid

I've spent 27 years working on data systems — focused on query engine internals, low-latency execution, and the kind of performance work that determines whether a system holds up under real-world pressure. That journey led me to found KwikQuery LLC and build TabbyDB, a high-performance fork of Apache Spark that tackles query compilation and runtime performance problems at scale.

My background is in Chemical Engineering from IIT Bombay, but I moved into software early and never looked back. Much of my foundation in performance engineering was built at GemStone Systems working on GemFire — the distributed data platform later open-sourced as Apache Geode. GemFire was an environment where low latency wasn't a goal, it was the culture, and that shaped how I approach every system I've worked on since. During that time I made significant enhancements to GemFire's OQL engine — extending it to evaluate join queries using range indexes via high-performance nested loop execution, and adding colocated join and nested query support, each of which required substantial rework of the engine's core, apart from work on Disk Based Persistence and Client Server topology.

From there I worked at VMWare, Dell, Pivotal, SnappyData ( all as part of aquisitions of GemStone Systems via VMWare), Tibco (after aquisition of SnappyData) , Workday, and Cloudera — building approximate query processors, off-heap memory storage layers, and Spark optimizer improvements along the way.

At Workday, a Constraint Propagation problem in Apache Spark's Catalyst optimizer had been identified as a serious bottleneck — complex plans were taking hours to compile. I devised a new algorithm that brought that down from 8 hours to under 10 minutes, work recognized with Workday's Tech Wizard award and co-presented at the Databricks Spark Summit in 2021. At Cloudera I continued at the engine level, resolving deep performance and correctness issues for customers including Bank of America, Amazon, and JPMC.

TabbyDB applies all of that to fix problems stock Spark leaves unsolved — compile-time blowup from constraint explosion and uncapped query trees, redundant optimizer rule application, excessive Hive Metastore calls, and runtime inefficiency from the absence of Broadcast Hash Join key pushdown for file pruning on non-partitioned columns. TPC-DS benchmarks show 13% improvement on AWS nodes and 17% on Ampere M1 nodes. Combined with Apache Iceberg, early testing shows a 46% improvement, with larger benchmarks underway.

Outside of work, I've volunteered as a shelter assistant at the Cat Adoption Team in Beaverton for the past three years, going in every couple of weeks. I also mentor engineers on algorithm development through TAP Careers, an organization supporting students, job seekers, and professionals from Palestine — Gaza and the West Bank. I'm based in Beaverton, Oregon.

If you're dealing with Spark performance or compilation issues, I'd be happy to talk.

→ asif.shahid@kwikquery.com

Get in touch

About KwikQuery LLC

The Problem We're Solving

Asif Hussain Shahid

Founders

New Title

Contact

Get in touch

About KwikQuery LLC

The Problem We're Solving

﻿

Asif Hussain Shahid

Founders