MVF operates a global customer generation platform. Their data estate spans paid media, customer interaction systems, internal databases, and sensitive disposition data tied to regulated workflows.
Like many teams, they relied on an off-the-shelf ETL vendor for the majority of their ingestion. On paper, coverage looked strong. In practice, gaps kept appearing.
Some platforms were not supported at all. Others were only partially supported. Several critical pipelines required custom logic or non-standard patterns to function reliably.
To keep data flowing, MVF maintained a parallel layer of custom pipelines in Airflow.
This is the trap unsupported connectors create.
They do not usually break dashboards. They create operational drag. They introduce fragility. They quietly transfer responsibility for data engineering from the vendor back to the customer.
At a small scale, custom connectors feel manageable. A few scripts. A handful of DAGs. Occasional fixes.
At MVF’s scale, that approach stopped working.
They were maintaining half a dozen bespoke pipelines for platforms such as Iterable, Five9, Survicate, Everflow, Injixo, Invoca, Airtable, and were faced with having to build Baidu from ground up. Each had its own failure modes, retry logic, scheduling quirks, and maintenance burden.
Some pipelines handled highly sensitive data, including HIPAA and GDPR-relevant disposition events. These relied on fragile webhook-based architectures that sat outside MVF’s direct control.
At the same time, their primary ETL vendor changed pricing models. Costs escalated rapidly. Service levels declined. Support cycles stretched into days.
As MVF’s Head of Data described it, it no longer felt like a partnership.
At that point, the question was no longer “how do we optimise this setup?”
It was “why are we carrying this risk at all?”
Although cost pressure forced a decision, connector coverage was the underlying constraint.
MVF needed three things at once:
Most ETL platforms can do one of these. Very few can do all three together.
MVF partnered with Matatika to redesign their ingestion layer around a simple principle: if a connector matters to the business, it should be treated as first-class infrastructure, not a workaround.
The migration approach mattered.
All existing pipelines were mirrored in parallel before any cutover. Both systems ran side by side while MVF validated outputs. There was no forced leap and no double payment during migration.
At the same time, custom pipelines were systematically replaced.
Unsupported and fragile jobs were rebuilt as stable connectors, including:
This was not a like-for-like copy. Several pipelines were improved during the rebuild.
One of the most important changes was how sensitive disposition data was handled.
Previously, parts of this data flowed via webhooks into third-party infrastructure. That created both operational fragility and compliance risk.
During the migration, these pipelines were re-engineered to use direct API calls instead. This simplified the architecture, reduced dependencies, and gave MVF full control over what data was pulled and how it was processed.
Other improvements followed naturally.
Fragmented pipelines were consolidated. For example, where multiple parallel connectors existed for the same advertising platform, ingestion was redesigned so a single connector could pull multiple accounts and regions efficiently.
This reduced duplication, lowered runtime, and removed design decisions that had been driven by pricing constraints rather than technical sense.
Once the connector layer was simplified, performance improved without explicit optimisation work.
A clear example was the Bing Ads pipeline. Runtime dropped from roughly 1.8 hours to around 30 minutes. That made intraday refresh feasible for the first time, improving marketing responsiveness while reducing downstream warehouse costs.
Across the estate, pipelines became faster, easier to reason about, and easier to schedule.
Just as importantly, Airflow was fully decommissioned.
With no custom DAGs to maintain, MVF eliminated two days of monthly engineering maintenance work. The data team could focus on analytics and commercial priorities rather than pipeline upkeep.
As one internal summary put it: they went to Matatika for a migration and ended up with an improvement of their stack.
The migration covered more than 60 connectors and over one billion rows of data per year. It was delivered in two months, with zero downtime.
The outcomes were tangible:
Critically, MVF no longer has to design their data architecture around vendor limitations.
If a new platform becomes commercially important, they can request a connector and have it built, rather than absorbing the engineering burden internally.
Unsupported connectors are not an edge case. They are an inevitability.
Every organisation has systems that sit outside the “top 20” platforms vendors optimise for. The question is not whether gaps will appear, but who pays the cost when they do.
When teams accept coverage gaps, they also accept:
MVF’s experience shows a different path.
Instead of working around gaps, they eliminated them.
If your ETL vendor cannot support the connectors your business actually relies on, the risk is not theoretical. It is already embedded in your stack.
#Blog #Cloud Cost Optimisation #Data Engineering #Data Infrastructure #ETL Tools