Most teams don’t switch ETL platforms because they’re happy. They stay because switching feels risky, time-consuming, or impossible to get signed off. But the real risk is in doing nothing.
In the world of databases, we learned long ago that column vs row storage affects performance and efficiency. So why haven’t we applied the same thinking to pricing models?
Row-based ETL pricing punishes scale, charging you more for every row, whether or not it delivers value. That’s like buying a sports car and paying per engine rev.
The smarter alternative? Performance-based pricing that rewards clean, efficient pipelines and keeps costs aligned to actual outcomes, not arbitrary row counts.
If you’re a data leader, you’ve likely heard this around your office: “Our data bill keeps rising, and we have no control over it.” Your business grows, you process more data, and before you know it your ETL costs spiral out of control. Not because you’re getting more value, but simply because you’re processing more rows.
Let’s deep dive into column store indexes – a revolution in database technology.
Columnstore indexes, especially in modern databases like SQL Server, Oracle, and PostgreSQL (with extensions) and columnar storage databases like Snowflake, offer several key advantages over traditional rowstore (or B-tree) indexes, particularly for analytical workloads. Here are the primary benefits:
1. High Data Compression
Columns often contain repeating or similar values (e.g., status = ‘active’), which are highly compressible.
Benefit: Significantly reduced storage size, often by 5x–10x compared to row-based storage.
Impact: Reduces I/O and memory usage, speeding up queries.
2. Faster Analytical Queries
Analytical queries often scan large volumes of data but only a few columns.
Benefit: Columnstore indexes are able to read only the relevant columns, skipping the rest.
Impact: Much faster performance for SELECT queries involving aggregations, filters, and joins on large datasets.
3. Batch Processing
Columnstore indexes use vectorised execution or batch mode processing, where operations are performed on a batch of rows simultaneously.
Benefit: Lower CPU usage and faster query execution.
4. Eliminates the Need for Many Secondary Indexes
Since columnstore indexes naturally support efficient filtering and aggregation, you may need fewer traditional indexes.
Benefit: Simplifies index maintenance and reduces storage overhead.
5. Great for Data Warehousing & OLAP
Ideal for workloads where:
Benefit: Designed specifically for OLAP (Online Analytical Processing) environments.
6. Improved I/O Performance
Because only the required columns are read and data is highly compressed, I/O operations are minimised.
Benefit: Especially important when working with massive datasets that can’t all fit in memory.
7. Built-in Segment Elimination (Partition Pruning)
Data in columnstore indexes is internally organised into segments.
Benefit: The engine can skip entire segments during a query if they are not relevant, further improving performance.
Summary: When to Use Columnstore Indexes
Ideal Use Cases | Avoid If… |
Large data warehouses | Heavy transactional (OLTP) workloads |
Read-heavy analytical queries | Frequent row-level updates/deletes |
Aggregation-heavy reports | Low-latency insert/update requirements |
Similar technology leaps are occurring at every level – including ETL. So if platforms are becoming more efficient, why are you still paying by volume?
Most legacy ETL platforms still use volume-based pricing models that charge you for every row processed, regardless of whether that data is:
This outdated approach creates three critical problems:
1. Growth Gets Penalised, Not Rewarded
When your business succeeds and data volumes increase, your ETL costs skyrocket—even if the actual compute resources remain stable. This creates a perverse incentive where success drives financial pressure.
As one Head of Data told us: “We’ve reached the point where we actively discourage new analytics use cases because of the ETL cost implications. That’s not how data should work.”
2. Budgeting Becomes Impossible
Finance teams expect predictable costs, but with pricing tied to data volume rather than actual usage, a single new data source or traffic spike can blow your budget.
When the CFO asks “Why did our ETL bill jump 40% this quarter?” the answer shouldn’t be “Because our business is doing well.”
3. Efficiency Goes Unrewarded
Row-based pricing provides zero incentive for vendors to help you optimise. In fact, they make more money when your pipelines are inefficient and process redundant data.
From a technical perspective, this model fundamentally misaligns with how modern data infrastructure actually works. Storage and compute resources scale sub-linearly with data volume, so why should your bill scale linearly with row counts?
Forward-thinking data teams are breaking free from the row-based pricing trap.
They Benchmark Costs Before They Renew
Smart teams don’t renew ETL contracts without first conducting a thorough cost analysis. They examine:
One financial services client discovered that 72% of their ETL costs came from just 3 of their 37 data sources, all syncing full datasets hourly when daily incremental updates would suffice.
Quick Assessment Checklist:
If you answered yes to two or more, you’re likely caught in the row-pricing trap.
They Test New Platforms Before Committing
The most innovative teams run new and existing ETL systems side by side using solutions like Matatika’s Mirror Mode. This approach allows them to:
“We feel stuck, switching vendors seems too risky,” is a common sentiment we hear. But the real risk is in not testing alternatives while you still have leverage, ideally 3-6 months before your renewal deadline.
They Automate Workflows – Not Just Syncs
Leading data teams focus on end-to-end workflow automation:
By automating these workflows, teams reduce manual intervention, improve reliability, and focus on delivering insights rather than maintaining infrastructure.
Just as we’ve evolved beyond row-based storage for analytical databases, it’s time to evolve beyond row-based pricing for ETL.
We don’t build analytical databases on row storage anymore, so why do we still accept row-based billing for ETL?
Performance-based pricing represents a fundamental shift in how ETL services are monetised:
Row-Based Pricing (Legacy) | Performance-Based Pricing (Modern) |
Charges per row processed | Charges for actual infrastructure used |
Costs increase linearly with data volume | Costs align with actual compute resources |
Penalises scale and growth | Scales efficiently with your business |
Rewards vendor for inefficiency | Incentivises vendor to optimise performance |
For technical teams, this means your infrastructure costs finally align with actual resource consumption, CPU, memory, and IO operations, rather than an arbitrary metric that bears little relation to actual system load.
Most teams that switch to performance-based pricing save between 30-90% on their ETL costs by only paying for what they actually use.
Recent industry research reveals the hidden toll of traditional ETL pricing models:
One online media platform we worked with was struggling with both cost and performance issues. Their data delivery was too slow, and their ETL costs were growing unsustainably.
By implementing a performance-based pricing model with optimised incremental processing, they achieved:
As their Data Team put it: “The reliability, speed, and cost savings have been remarkable, and the support is brilliant.”
Most teams fear ETL migration because traditional approaches force an impossible choice: either rebuild all pipelines from scratch while paying for two systems during transition (often taking 6-12 months), or stay trapped in an inefficient pricing model.
Matatika’s Mirror Mode offers a better path. This approach allows both systems to run concurrently, so teams can validate performance, compare outputs, and ensure data consistency before committing to any changes, all in about 3 months rather than 6-12 months. Mirror Mode eliminates the risk that typically makes ETL transformations stressful, providing proof before commitment without double payments or downtime risk.
How does performance-based pricing differ from consumption-based pricing?
Consumption-based pricing ties costs to data volume metrics like rows processed. Performance-based pricing focuses on actual computing resources used—regardless of how many rows move through pipelines. For finance teams, this means predictable costs that grow with infrastructure needs, not arbitrary data volumes.
Don’t costs increase as data volumes grow under any pricing model?
Yes, but at much slower rates with performance-based pricing. These models scale sub-linearly with data volume because efficient incremental processing means you only pay for changes, not full datasets. Most organisations see their cost curve flatten significantly over time, even as data volumes continue to grow.
When is the optimal time to evaluate ETL pricing models?
Ideally, begin exploring alternatives 3-6 months before your renewal date. This provides sufficient time to understand options, evaluate potential savings, and approach negotiations from a position of strength. Waiting until the last minute often results in reluctant renewals and missed savings opportunities.
How can data teams minimise risk when considering ETL changes?
Mirror Mode’s parallel approach eliminates the traditional risks of migration. By running both systems concurrently, you can compare outputs, validate performance, and verify data consistency before making any changes. This provides concrete evidence with your actual workloads, making decisions based on facts rather than fears.
The shift from row-based to performance-based ETL pricing isn’t just a cost-cutting measure—it’s a fundamental rethinking of how data infrastructure should be built and billed.
Just as we abandoned row-oriented databases for analytical workloads, it’s time to leave behind row-based pricing that:
With Matatika’s performance-based pricing and Mirror Mode approach, you can:
Book your Renewal Ready Strategy Session
Ready to explore how performance-based pricing might benefit your organisation? Schedule a complimentary consultation to evaluate your current ETL environment.
Our experts will help you benchmark current costs, identify potential inefficiencies, and quantify possible savings, providing the clear data points needed for informed decision-making.
Stay up to date with the latest news and insights for data leaders.