Rising warehouse bills are forcing data leaders to rethink where workloads belong. One analytics leader recently described the frustration: “We’re spending thousands monthly on Snowflake, but half those credits go to engineers just testing queries.”
Even with careful query tuning, most teams eventually hit the same problem, optimisation stops paying off whilst credits keep climbing. Engineers burn budget on development iterations. Data scientists trigger expensive scans during exploration. BI tools repeatedly query the same datasets. Analytics workloads that could run locally hit expensive warehouse compute instead.
That’s where DuckDB enters the conversation. It’s fast, lightweight, and local, offering teams a way to eliminate unnecessary warehouse costs by running workloads closer to where they’re actually needed, without touching production systems.
At our October LinkedIn Live, we brought together three experts to unpack where DuckDB really delivers savings, where it doesn’t, and how to make those gains sustainable across your entire stack.
You’ll discover:
This article captures insights from our discussion featuring Kyle Cheung (Greybeam), Bill Wallis (Tasman Analytics), and Aaron Phethean (Matatika).
Kyle Cheung — Founder, Greybeam
Kyle helps teams adopt open-source analytical engines like DuckDB and connect them with enterprise infrastructure. He guides clients through practical integration challenges as they shift from monolithic warehouse dependency to modular hybrid systems.
Bill Wallis — Founder, Tasman Analytics
Bill advises analytics teams experimenting with local-first data approaches. His daily work with DuckDB provides ground-truth perspective on what actually works when moving from side project to production workflow.
Aaron Phethean — Founder, Matatika
Aaron leads Matatika’s platform strategy, helping data teams eliminate lock-in and cut processing costs with performance-based pricing. His focus is on enabling seamless, low-risk transitions between tools using Matatika’s Mirror Mode validation approach.
Data teams are burning warehouse credits in ways that don’t show up as “bad queries”:
Development workflows hitting production compute – engineers testing transformations burn credits every iteration, turning simple debugging into expensive operations.
Ad-hoc analysis eating budget – data scientists exploring datasets trigger expensive scans that could run locally for free.
CI/CD pipelines duplicating costs – every pull request runs full warehouse refreshes when most changes affect only small portions of data.
No visibility into unit costs – finance sees the total bill but can’t connect warehouse spend to actual business value delivered.
Meanwhile, finance demands cost cuts whilst business stakeholders expect faster insights. Teams feel trapped between warehouse bills that scale with every query and the fear of disrupting production systems that already work.
The pressure mounts when traditional optimisation delivers diminishing returns. You’ve tuned queries, implemented incremental models, and optimised scheduling. Yet costs keep climbing because your workload patterns fundamentally conflict with warehouse pricing models.
The insight: The fastest ROI comes from moving development and testing workloads off the warehouse — not migrating production systems.
Bill Wallis shared his approach at Tasman Analytics: “The main way I use DuckDB is to enable my developer workflow. Where the data isn’t sensitive, dump it locally into a parquet file, do all my analytics and development locally with DuckDB.”
This eliminates the cost pattern that hits most data teams. As Bill explained: “I’m not spending money every single time I’m running one of my development queries in Snowflake.”
The productivity gains extend beyond just cost. Local execution means instant feedback loops — queries that took 30 seconds in the warehouse run in under 2 seconds locally.
Kyle Cheung sees this pattern emerging across his client base: “Some of our customers are interested in running their CI or dev pipelines using DuckDB and not having to call Snowflake compute for that.”
How teams implement it:
Expected outcome: Teams eliminate the majority of development-related warehouse costs whilst accelerating feedback loops. Engineers stop waiting for warehouse scheduling and can iterate freely without budget concerns.
The insight: DuckDB is powerful for specific use cases, but it’s not a warehouse replacement — it’s a cost-control companion.
As Bill Wallis put it: “Governance, scale, and collaboration are where the warehouse still wins.”
Kyle Cheung emphasised understanding DuckDB’s design constraints: “It’s incredible for what it does, but it’s designed for a single node. That’s where the limits appear.”
Teams achieve the best results by using DuckDB where it excels — for local analytics, validation, and caching — whilst keeping governed data and large-scale processing in cloud warehouses.
How teams implement it:
Expected outcome: Predictable governance, faster experimentation, and reduced risk of data drift. Teams gain cost savings through smarter workload placement without sacrificing the warehouse capabilities that matter for production systems.
The insight: The future isn’t “warehouse versus DuckDB” — it’s hybrid execution where you run small workloads locally and reserve cloud compute for where it matters most.
Aaron Phethean connected this to broader infrastructure trends: “We’re seeing the same pattern as DevOps — push more development closer to the engineer, automate what’s repeatable, and reserve the heavy lifting for where it matters most.”
This mirrors how modern software engineering works. Developers run tests locally, then promote to staging and production. Data teams can apply the same principles.
The challenge is maintaining consistency. Kyle noted: “You need your local environment to behave like production, or you’re just creating different problems.”
How teams implement it:
Expected outcome: Stable hybrid pipelines that combine DuckDB’s speed and zero-cost iteration with cloud resilience and governance. Engineering velocity increases because local testing removes warehouse scheduling as a bottleneck.
The insight: Real efficiency isn’t about cutting tools, it’s about measuring cost per unit of value delivered and optimising from there.
Aaron Phethean highlighted that cost visibility is often the missing link: “We don’t need to rip out good systems. We just need to give teams the flexibility to run smarter.”
Most finance teams see total warehouse bills without understanding which workloads generate business value versus which burn credits unnecessarily. Without attribution, you can’t optimise effectively.
How teams implement it:
Expected outcome: Data teams gain control of their budget narrative. You can show leadership exactly where money goes, prove ROI for infrastructure changes, and make confident decisions about workload placement based on actual cost data rather than assumptions.
The insight: Cost control isn’t a one-off exercise, it’s a mindset. Teams that stay flexible can adopt new approaches like DuckDB without painful migrations later.
Kyle Cheung shared how his clients avoid lock-in: “You don’t need to change everything at once. Start small, see what actually saves money, then scale that.”
Aaron Phethean emphasised long-term thinking: “If a new engine outperforms your current stack, you should be able to test it without disruption.”
How teams implement it:
Expected outcome: A modular, future-proof data stack that allows experimentation without downtime or double-payment. Leaders gain the freedom to choose the best performance per cost at any point in time rather than being locked into decisions made years ago.
Bill Wallis described the immediate productivity shift from his daily experience: “I’m not spending money every single time I’m running one of my development queries in Snowflake. The feedback loop is instant, queries that took 30 seconds now run in under 2.”
That speed advantage compounds over weeks and months. Engineers who previously waited for warehouse queries during development can now iterate freely, testing ideas without budget concerns or scheduling delays.
Kyle Cheung sees measurable results across his client implementations: “Some customers run their entire CI pipeline using DuckDB. They’re not hitting Snowflake compute at all until production deployment.”
The validation approach matters as much as the technology choice. Aaron emphasised: “Teams using Mirror Mode can prove DuckDB savings before changing production. You’re not asking leadership to trust you, you’re showing them side-by-side cost comparisons.”
This evidence-based approach removes the usual migration anxiety. Instead of big-bang changes that could disrupt production, teams validate incrementally and only commit once results are clear.
Start with impact analysis rather than immediate implementation. Identify where your warehouse is being used for low-value workloads, development, testing, or ad-hoc analysis that doesn’t require governed production data.
Choose one workflow as a pilot. Move it to DuckDB and measure the cost and performance difference over two weeks. Track warehouse credit reduction, engineering productivity gains, and any friction points that emerge.
Then use Matatika’s Mirror Mode to replicate and validate production pipelines side-by-side, proving savings before making any irreversible changes. This parallel validation eliminates the traditional migration risk of “we won’t know if it works until we’ve fully switched.”
Key metrics to track:
The goal is sustainable efficiency through hybrid execution that scales with business demands rather than hitting artificial limits imposed by pure warehouse or pure local approaches.
The teams achieving sustainable cost control aren’t choosing between warehouse and DuckDB, they’re building hybrid infrastructure that uses each where it excels.
DuckDB eliminates unnecessary warehouse spend on development and testing. Warehouses continue handling governed data, large-scale processing, and multi-user collaboration. The combination delivers better economics than either approach alone.
What successful teams do differently: they start with impact analysis, validate new approaches with Mirror Mode before committing, and build optionality into their stack so they can adapt as better tools emerge.
The goal isn’t change for change’s sake. It’s sustainable growth through infrastructure that enables rather than constrains business opportunities whilst keeping costs aligned with actual value delivered.
Teams that master hybrid execution gain competitive advantage through faster engineering velocity and transparent cost attribution that proves ROI to leadership.
Ready to identify where your warehouse credits are going and whether hybrid execution makes sense for your stack?
We’ll help you assess your current warehouse usage patterns, identify cost optimisation opportunities through smarter workload placement, and show you how Mirror Mode validation eliminates traditional migration risks. If DuckDB-style hybrid execution makes sense for your situation, we’ll map out a clear path forward.
You’ll get a concrete benchmark of your current cost-per-workload and visibility into realistic improvements you can present to leadership with confidence.
Learn about Mirror Mode validation for risk-free infrastructure testing
Seen a strategy that resonates with you?
#Blog #Cloud Cost Optimisation #data #Data Engineering #Data Infrastructure #DuckDB #ETL #ETL migration
Stay up to date with the latest news and insights for data leaders.