Posts Tagged ‘ETL Tools’

When your ETL vendor doesn’t support the connectors you need

Most ETL problems do not start with performance or reliability. They start with coverage. A ETL connector is missing. Or partially supported. Or technically “available,” but not usable in the way your business actually operates. At first, teams work around it. They build a small custom job. They add a webhook. They patch something together in Airflow. It feels temporary. Over time, those workarounds become permanent infrastructure. This is the position MVF found itself in.

The Economics of the Modern Data Stack

Who captures the value now? The modern data stack still works. The problem is not capability. The problem is economics. On Friday, we hosted a LinkedIn Live to talk about something most teams feel but rarely articulate clearly: the incentives behind the modern data stack have shifted, and those shifts are starting to shape architecture, pricing, and leverage in ways that matter over the next 12–24 months. This was not a tools debate. It was a discussion about who captures value, who carries risk, and why “best-of-breed” no longer feels neutral. Joining the conversation were: Maxime Beauchemin, creator of Apache Airflow and Apache Superset. Taylor Murphy founder of Meltano and Arch, now at Astronomer. Aaron Phethean, Founder and CEO of Matatika. What follows is not a summary. It’s what actually matters.

Baidu ETL Connector: How MVF Solved an Unsupported Data Source

Today, Matatika is the only ETL provider offering a fully supported, production-grade Baidu connector. While most ETL platforms support mainstream paid media sources like Google Ads and Facebook Ads, Baidu often sits outside standard connector catalogues. For teams running marketing activity in China, this creates a familiar problem: critical data exists, but there is no clean, supported way to ingest it. MVF ran into exactly this issue.

Analytics Engineering: Internal Risk vs. External Rigor

In this episode of the Data Mata's podcast, host Aaron Phethean is joined by Jack Doherty, VP of Analytics Engineering at Fresha. The conversation explores a tension many data teams now face: the difference between building analytics for internal use versus delivering analytics as a customer facing product. Jack’s team at Fresha operates in both worlds. On one side, they support internal stakeholders with classic analytics. On the other, they power analytics that sit directly inside the Fresha product and are used daily by thousands of customers running health and beauty businesses. That dual responsibility forces very different ways of thinking about risk, speed, and quality.

Why marketing data connectors quietly inflate your ETL costs

And how MVF consolidated paid media data without losing insight Marketing data rarely breaks loudly. It degrades quietly. Pipelines keep running. Dashboards still load. Spend gets approved. But somewhere between your tenth and thirtieth connector, the economics stop making sense. Teams often assume marketing connectors are cheap because each one looks small in isolation. A Google Ads connector here. A LinkedIn Ads connector there. Another for TikTok. Another for reporting exports. Each one feels justified. Together, they quietly become the most expensive part of the data stack. MVF learned this the hard way.

Building Data Platforms That Actually Solve Business Problems

Meet Teddy Bernays Teddy is a highly autonomous Freelance Data Engineer and Google Cloud Trainer who focuses on helping startups and mid-sized companies build efficient, scalable, and cost-effective data platforms. He started his career in the complex world of audio engineering before transitioning to IT, where he found a fascination with the mechanics "under the hood" of data systems. Today, he is a firm believer that the solution to data inconsistency isn't always more code, but more clarity. His approach is simple: “If it’s simple, do it simple. You don't need three different tools to solve a one-tool problem.”

MVF Makes ETL 7x Cheaper While Migrating 1B+ Rows Across 60+ Sources

MVF is a global customer generation platform that depends on fast, accurate data pipelines to drive reporting and marketing performance across multiple regions and channels. Their ETL spend had nearly tripled under a new pricing model, creating unsustainable cost pressure. Costs were escalating faster than business value, engineering teams were stretched, and they needed a simpler, more sustainable approach.


Background:

MVF is a global customer generation platform that depends on fast, accurate data pipelines to drive reporting and marketing performance across multiple regions and channels.

Their ETL spend had nearly tripled under a new pricing model, creating unsustainable cost pressure. Costs were escalating faster than business value, engineering teams were stretched, and they needed a simpler, more sustainable approach.

As Andonis Pavlidis, Head of Data at MVF, explained: “Prices were always increasing, SLAs were dropping. It didn’t feel like a partnership anymore, but rather like a commodity.”


Challenge: Spiralling Costs and Deteriorating Service

MVF’s data team was under increasing strain from a fragile and inefficient ETL setup.

  • Engineering drag – half a dozen custom pipelines in Airflow consumed valuable analyst time.
  • Support bottlenecks – four-day ticket cycles left the team firefighting instead of focusing on growth. As Andonis Pavlidis described it: “When something went down with Fivetran, it was four days of back and forth tickets. If something goes down with Matatika, it’s a Slack message and it’s fixed by the next day.”
  • Sensitive data risk – HIPAA/GDPR disposition pipelines relied on fragile infrastructure outside MVF’s control.
  • Fragmented pipelines – multiple parallel connectors inflated costs and slowed data delivery.

Solution: Mirror Mode Migration + Custom Connector Enablement

MVF partnered with Matatika to deliver a risk-free migration strategy that cut costs and simplified infrastructure.

  • Mirror Mode migration – replicated all existing pipelines in parallel, ensuring zero disruption during validation and cutover.
  • Custom connector replacement – rebuilt and stabilised bespoke Airflow jobs, including Iterable, Five9, Survicate, Everflow, Injixo, Invoca, Airtable and more. New connectors such as Baidu were also developed.
  • Sensitive data improvements – re-engineered disposition pipelines from fragile webhooks to robust API calls, giving MVF safer and more controllable data flows.
  • Simplified operations – decommissioned Airflow and introduced a dedicated development workspace for safe testing and deployment.

As Andonis Pavlidis noted: “We loved working with the Matatika team as they felt an extension of our team. They shared the same frustrations and reacted as we would react ourselves.”


Results: 7x Cheaper, Zero Risk, Superior Performance

The migration delivered more than savings. MVF gained faster pipelines, stronger reliability, and greater team productivity. Is this better?

  • 7 x cheaper than renewal (86% cost reduction) – achieved with Matatika’s performance-based pricing, while migrating 1B+ rows across 60+ sources.
  • Risk-free migration – completed with zero downtime or disruption.
  • Up to 4x faster pipelines – Bing Ads connector reduced from 1.8 hours to 30 minutes, enabling intraday refresh and reduced Snowflake cost.
  • Engineering capacity reclaimed – two days of monthly maintenance eliminated, freeing analysts for higher-value projects.
  • Same-day issue resolution – from “four-day ticket cycles” to direct Slack access with Matatika engineers.
  • Complete infrastructure modernisation – legacy Airflow retired, fragmented pipelines consolidated, and data architecture strengthened for scalability and compliance.

As Andonis Pavlidis concluded: “We went to Matatika for a migration. We ended up with an improvement of our stack.”


Appendix: Full Connector List

 

Fivetran connectors migratedGoogle Adwords, GA4 Export, Iterable, MySQL (x12), Facebook Ads, Appwiki, Bing Ads, Webhooks, Outbrain, Taboola, Twitter Ads, TikTok Ads, LinkedIn Ads, Google Sheets, S3 and Dbt Cloud Reporting.

Custom Airflow connectors migrated to Matatika supported connectorsAirtable, Five9, Survicate, Everflow, Injixo, Invoca, Custom S3.

New connector developedBaidu and CallMiner


Appendix: Full Connector List

MVF’s migration spanned a wide range of connectors across marketing, customer interaction, compliance, and operational data sources, together accounting for more than 1B rows per year.

Category Example Connectors Estimated Rows / Year
Marketing & Paid Media Google Adwords, GA4 Export, Facebook Ads, Bing Ads, TikTok Ads, LinkedIn Ads, Outbrain, Taboola 600M
Customer Interaction Iterable, Five9, Survicate, Everflow, Invoca (real-time + historical) 250M
Sensitive / Compliance Data Webhooks (disposition bronze), Airtable, Injixo 120M
Operational / Other Sources MySQL (x12), Google Sheets, S3 Export, Dbt Cloud Reporting, ExpertReview S3 30M
New Development BaiduCall MIner Included above

Google Sheets for Analysts: How to Eliminate the Data Update Bottleneck

The Real Cost of the Bottleneck Engineering time disappears into trivial updates. Every data change request is a context switch. Even five-minute tasks fragment your day. You never get into deep focus because you're constantly interrupted by "quick questions" that aren't actually quick.

The Ticket Queue That Never Ends

Your analyst sends a Slack message at 9am.

“Hey, can you update the product-to-GL mapping table? Finance added three new categories last week and our revenue reports are off.”

It’s a five-minute job. You know exactly what needs doing. But you’re in the middle of debugging a pipeline issue, and context switching now means you’ll lose an hour getting back to where you were.

“I’ll get to it this afternoon,” you reply.

By 2pm, you’ve forgotten. The analyst follows up at 4pm. You finally do it at 5:30pm, right before you are about to leave. The analyst thanks you, clearly frustrated that a simple update took all day.

The next week, same analyst, different request. Update the territory assignments. Then the customer segment definitions. Then the exchange rates. Then back to the GL mappings because something changed again.

You’re not a data engineer anymore. You’re a ticket processor for data updates that shouldn’t require engineering at all.

Stop Manually Uploading Spreadsheets: 5 High-Impact Use Cases (and How Resident Advisor Fixed It)

If your week still includes exporting CSVs and uploading them into dashboards, you’re paying a “data tax” in delays, context switching, and stale numbers. Here are the top five use cases where teams should replace manual uploads with a real Google Sheets → warehouse pipeline plus a quick look at how Resident Advisor (RA) made this work in production.

Aftermath of the DBT + Fivetran Merger: What It Really Means for Data Teams

Two weeks on from the DBT + Fivetran merger, and the dust hasn’t settled, it’s only just starting to reveal what comes next. The deal didn’t just merge two companies. It merged two very different philosophies about how modern data should move, transform, and deliver value.