Related posts for ‘#Blog’

Building High-Performance Data Teams Starts With People, Not Tools

In the world of data engineering, we often obsess over the "plumbing." We talk about ETL latencies, Snowflake clusters, and the latest vector databases. But according to Sam Wrench, Lead at Reality Mine and former GB Dodgeball coach, we’re often looking at the wrong part of the machine. The most efficient data pipeline in the world is useless if the people at either end of it—the engineers building it and the stakeholders consuming it aren't in sync. To Sam, data engineering isn't just a technical challenge; it’s a high-performance team sport.

When your ETL vendor doesn’t support the connectors you need

Most ETL problems do not start with performance or reliability. They start with coverage. A ETL connector is missing. Or partially supported. Or technically “available,” but not usable in the way your business actually operates. At first, teams work around it. They build a small custom job. They add a webhook. They patch something together in Airflow. It feels temporary. Over time, those workarounds become permanent infrastructure. This is the position MVF found itself in.

The Economics of the Modern Data Stack

Who captures the value now? The modern data stack still works. The problem is not capability. The problem is economics. On Friday, we hosted a LinkedIn Live to talk about something most teams feel but rarely articulate clearly: the incentives behind the modern data stack have shifted, and those shifts are starting to shape architecture, pricing, and leverage in ways that matter over the next 12–24 months. This was not a tools debate. It was a discussion about who captures value, who carries risk, and why “best-of-breed” no longer feels neutral. Joining the conversation were: Maxime Beauchemin, creator of Apache Airflow and Apache Superset. Taylor Murphy founder of Meltano and Arch, now at Astronomer. Aaron Phethean, Founder and CEO of Matatika. What follows is not a summary. It’s what actually matters.

Baidu ETL Connector: How MVF Solved an Unsupported Data Source

Today, Matatika is the only ETL provider offering a fully supported, production-grade Baidu connector. While most ETL platforms support mainstream paid media sources like Google Ads and Facebook Ads, Baidu often sits outside standard connector catalogues. For teams running marketing activity in China, this creates a familiar problem: critical data exists, but there is no clean, supported way to ingest it. MVF ran into exactly this issue.

Analytics Engineering: Internal Risk vs. External Rigor

In this episode of the Data Mata's podcast, host Aaron Phethean is joined by Jack Doherty, VP of Analytics Engineering at Fresha. The conversation explores a tension many data teams now face: the difference between building analytics for internal use versus delivering analytics as a customer facing product. Jack’s team at Fresha operates in both worlds. On one side, they support internal stakeholders with classic analytics. On the other, they power analytics that sit directly inside the Fresha product and are used daily by thousands of customers running health and beauty businesses. That dual responsibility forces very different ways of thinking about risk, speed, and quality.

Why marketing data connectors quietly inflate your ETL costs

And how MVF consolidated paid media data without losing insight Marketing data rarely breaks loudly. It degrades quietly. Pipelines keep running. Dashboards still load. Spend gets approved. But somewhere between your tenth and thirtieth connector, the economics stop making sense. Teams often assume marketing connectors are cheap because each one looks small in isolation. A Google Ads connector here. A LinkedIn Ads connector there. Another for TikTok. Another for reporting exports. Each one feels justified. Together, they quietly become the most expensive part of the data stack. MVF learned this the hard way.

🔔 Celebrating a Most Eventful Matatika Year! 🔔

The Matatika Year in Review: On the Twelfth Day of Data... As the year wraps up, we’re taking a lighthearted, musical look back at the incredible journey we’ve shared! Thanks to the energy and support of our amazing community, 2025 has been an absolutely unforgettable year of growth, connection, and major milestones. Grab a hot drink, and join us as we sing the praises of the Matatika year that was!

 

Building Data Platforms That Actually Solve Business Problems

Meet Teddy Bernays Teddy is a highly autonomous Freelance Data Engineer and Google Cloud Trainer who focuses on helping startups and mid-sized companies build efficient, scalable, and cost-effective data platforms. He started his career in the complex world of audio engineering before transitioning to IT, where he found a fascination with the mechanics "under the hood" of data systems. Today, he is a firm believer that the solution to data inconsistency isn't always more code, but more clarity. His approach is simple: “If it’s simple, do it simple. You don't need three different tools to solve a one-tool problem.”

Looking for Arch?

Arch has officially shut down, and the Arch platform is no longer operating. If you’re an Arch customer looking for what’s next, you’re in the right place. Matatika is now supporting former Arch customers and helping teams continue their analytics and data workflows without disruption.

Arch has officially shut down, and the Arch platform is no longer operating.

If you’re an Arch customer looking for what’s next, you’re in the right place. Matatika is now supporting former Arch customers and helping teams continue their analytics and data workflows without disruption.

In addition, Matatika has acquired the Meltano open source project, ensuring its continued development, maintenance, and long-term sustainability. Meltano remains open source, community-driven, and is now backed by a team focused on stability, extensibility, and enterprise-ready data tooling.

If you were using Arch or Meltano and need guidance, migration support, or want to learn how Matatika can help you move forward, we’re here to help.

Welcome to the next chapter.

Google Sheets for Analysts: How to Eliminate the Data Update Bottleneck

The Real Cost of the Bottleneck Engineering time disappears into trivial updates. Every data change request is a context switch. Even five-minute tasks fragment your day. You never get into deep focus because you're constantly interrupted by "quick questions" that aren't actually quick.

The Ticket Queue That Never Ends

Your analyst sends a Slack message at 9am.

“Hey, can you update the product-to-GL mapping table? Finance added three new categories last week and our revenue reports are off.”

It’s a five-minute job. You know exactly what needs doing. But you’re in the middle of debugging a pipeline issue, and context switching now means you’ll lose an hour getting back to where you were.

“I’ll get to it this afternoon,” you reply.

By 2pm, you’ve forgotten. The analyst follows up at 4pm. You finally do it at 5:30pm, right before you are about to leave. The analyst thanks you, clearly frustrated that a simple update took all day.

The next week, same analyst, different request. Update the territory assignments. Then the customer segment definitions. Then the exchange rates. Then back to the GL mappings because something changed again.

You’re not a data engineer anymore. You’re a ticket processor for data updates that shouldn’t require engineering at all.