The Data Leaders Digest – Sixth Edition

Published on July 31, 2025

The best insights come from real conversations, with real leaders making tough calls, solving messy problems, and figuring out how to scale their data teams in the real world. That’s why I started The Data Leaders Digest.

Each month, we share what’s actually happening inside data teams, pulled from our podcasts, live events, meetups, and honest behind-the-scenes conversations.

No theory. Just practical takeaways you can use right now to make better decisions, avoid common pitfalls, and keep moving forward.

 

This Month’s Focus

July has been a month of unexpected headlines in the data world. From Astronomer’s PR crisis to the ripple effects of Andrej Karpathy’s June keynote still reshaping how teams think about AI-first infrastructure, it’s a good time to step back and ask: what kind of companies do we want to build our data infrastructure on?

In this edition:

  • Why Karpathy’s “Software 3.0” vision exposes the flaws in current ETL pricing models
  • How the Astronomer kiss cam incident reveals bigger questions about vendor culture
  • Why teams are moving beyond traditional data orchestration approaches
  • The hidden costs of forcing tools into AI workflows they weren’t designed for
  • New guidance on choosing ethical, future-ready data partners that won’t trap you in yesterday’s paradigms

Article content

Market Insights

When Kiss Cams Go Wrong: What Astronomer’s PR Crisis Says About Vendor Culture

You’ve probably seen the headlines. Astronomer’s CEO got caught in an awkward kiss cam moment at a Coldplay concert, leading to a social media storm and the company hiring Gwyneth Paltrow as “chief brand officer” to manage the fallout.

It’s corporate theatre that’ll blow over. But for data leaders choosing vendors for critical infrastructure, it raises a question: what does this say about judgment and company culture?

The Pattern Problem The same judgment that led to that kiss cam situation, and then hiring Gwyneth Paltrow as a PR fix, makes decisions about your data platform. When pipelines fail at 2am, do you want it handled by a team that thinks celebrity endorsements solve technical problems?

The Airflow Reality While Astronomer managed their PR crisis, something more interesting happened with their core product. Apache Airflow, the open-source orchestrator Astronomer builds on is increasingly questioned by data teams.

As one client put it: “Airflow should only ever orchestrate, no compute should be done whatsoever.” Yet you see teams push Airflow beyond scheduling tasks into heavy data loading.

What’s Actually Happening Teams are moving away from force-fitting Airflow into every workflow. Two of our recent clients switched off their Airflow setups entirely, no more fragile DAGs failing at 2am.

The pattern: use the right tool for the right job. Teams are combining purpose-built tools, Meltano for extraction, dbt for transformation, Spring for scheduling, rather than forcing one tool to do everything.

The Ethics Question When a vendor’s response to controversy is hiring a celebrity spokesperson rather than addressing underlying issues, what does that tell you about their priorities?

The Bottom Line While Astronomer plays damage control with movie stars, the real question is: do you want critical infrastructure built by a company that thinks celebrity endorsements solve technical problems?

Most of us would rather focus on solving problems that actually cause sleepless nights in data ops.

Article content

What This Means for Data Leaders

The Astronomer situation highlights three critical questions every data leader should ask when choosing vendors:

1. Do their values align with yours?

Look beyond the marketing. How does the company handle controversy? Do they address root causes or just manage PR?

2. Are they using the right technical approach?

Don’t get trapped by vendor marketing about their “best-in-class” solution. Ask whether their tool is actually designed for your use case.

3. What happens when things go wrong?

Every platform has issues. The question is: how does the vendor respond? With genuine solutions or celebrity distractions?

Your Next Steps

If you’re currently using Airflow for data loading (rather than just orchestration), or if you’re questioning whether your current vendors align with your values, it might be time for a review.

Questions to ask your current setup:

  • Are we using orchestration tools for data processing?
  • Do our vendors’ business practices align with our company values?
  • Are we paying for celebrity endorsements instead of better engineering?

Consider alternatives that:

  • Use purpose-built tools for each job
  • Offer transparent, performance-based pricing
  • Focus on engineering excellence over PR stunts
  • Support sustainable open-source communities

Software’s Next Evolution

“Software is Changing (Again)” What AI Revolution Means for ETL

Andrej Karpathy (ex-Tesla AI lead) outlined software’s third major evolution at AI Startup School: from explicit code (1.0) to neural networks (2.0) to LLM-driven programming (3.0). But here’s what he didn’t mention is this shift exposes the fundamental problems with how we’ve been buying data tools.

The ETL pricing trap in an AI-first world:

  • Row-based pricing punishes AI experimentation: Training models requires iterating on datasets, but vendors charge you for every exploratory sync
  • Connector pricing breaks down with LLMs: AI applications need data from dozens of sources, but paying per-connector makes scaling prohibitively expensive
  • Vendor lock-in kills innovation: When LLMs can write integrations in natural language, why are we still trapped in proprietary platforms?

Karpathy’s “partial autonomy” insight matters for data teams: The future isn’t fully automated pipelines, it’s human-supervised systems that can adapt quickly. But current ETL vendors force you to choose between expensive flexibility or cheap rigidity.

The Matatika take: If everyone can now “vibe code” with LLMs, data infrastructure should be just as flexible. That’s why we built around performance-based pricing (pay for what you actually use) and open standards (no proprietary lock-in). When the next AI breakthrough arrives, you can adapt without renegotiating contracts or rewriting integrations.

Bottom line: Software 3.0 demands infrastructure that evolves with you, not against you. The teams that embrace this shift will build better AI applications – whilst those stuck in legacy pricing models will spend their budgets on vendor margins instead of innovation.

🔗 Watch the keynote here


Article content

London Analytics Engineering Meetup

What a night! Our 10 July meetup at Google London HQ was absolutely brilliant – possibly the best yet.

Joseph Lane from IAG Loyalty delivered my favourite talk of the evening. It was frank, hilarious, and packed with hard-earned insights from inside the real data project trenches. His migration lessons were golden: don’t rewrite legacy systems (even when they’re “correct but no one knows how”), and remember – as soon as your numbers are wrong, you’ve lost all credibility.

On team structures, Joseph’s take was spot on: centralised approaches create silos and slow everything down, whilst full autonomy with solid “town planning” works brilliantly at scale (think 40+ engineers, 1000+ models).

Holly Foster and Varun S Gangoor followed up with fascinating insights into data challenges in cyber security – a brilliant reminder that one team’s solution isn’t necessarily yours. Most of us aren’t dealing with 50TB of data per day!

Brilliant turnout, fantastic venue, and exactly the kind of honest, practical insights that make these meetups worth your evening.


Article content

LinkedIn Live Recap

Maximising Data ROI? Stack It With the Best

On 27th June, we hosted a session with Jack Colsy (Incident.io), Harry Gollop (Cognify Search), and Aaron Phethean (Matatika) exploring how to get the most out of your team, your tools, and your structure. The conversation highlighted a key insight: people, not platforms, are often your biggest cost—so how do you maximise what you’ve got?

If you’re dealing with budget pressure or wondering whether your current stack is holding your team back, our detailed recap breaks down the specific strategies each speaker shared for building high-performing data teams without breaking the bank.

Key takeaways from the session

  • Senior hires cut through ambiguity and reduce long-term tech debt
  • Tools should save time, not create more coordination work
  • If your team is constantly reacting, your setup isn’t working
  • Most teams don’t need more dashboards, they need faster answers

Missed the event?

📺 Watch the full replay here

🔗 Prefer to read instead? Get the full recap article here


Article content

Data Matas Podcast – Season 2: Your Summer Listening Sorted

We’re officially on our summer break until September, but before you switch off completely, make sure you’ve caught up on Season 2 of the Data Matas podcast – or if you missed it entirely, now’s the perfect time to dive in.

This season delivered exactly what we’re about: proper conversations with data leaders who’ve actually been there and done that.

  • Jon Hammant from AWSkicked us off with brilliant insights on scaling cloud infrastructure without the eye-watering costs.
  • Emily Loh from MoonPay and David Draper from IRIS Software shared how to build data teams that can actually handle constant change
  • Nik Walker from Co-opopened up about tackling burnout and rebuilding trust when things go sideways.
  • Oleg Agapov from Hiive rounded things off with a masterclass on evolving from tool-focused engineer to systems thinker.
  • John Napoleon-Kuofie from Monzo wrapped up the season with his take on building data platforms at scale

The common thread? The most successful data teams have stopped firefighting and started fundamentally rethinking how they work – simplifying stacks, ditching vendor lock-in, and making data genuinely reliable.

This is why Matatika is passionate about delivering real value to the data community through honest insights from people who’ve solved the problems you’re wrestling with right now.

Perfect summer entertainment

🎧Listen to the full episode 📺 Watch on You Tube

See you in September with fresh voices and more conversations that actually matter


New Resources

Make the Right Call on Architecture and Cost

This month we’ve published four new guides to help you cut through architectural trade-offs, clarify storage decisions, and navigate ETL pricing with confidence:

This month we’ve published four new guides to help you cut through architectural trade-offs, clarify storage decisions, and navigate ETL pricing with confidence:

  1. 7 Hours of Firefighting: What Google Cloud’s June Outage Really Cost Data Teams A major Google Cloud outage left data teams firefighting broken pipelines for hours. This article breaks down the ripple effects, where stacks failed, and what you can do to build resilience before the next incident hits.. 🔗 Read the full article
  2. The Hidden Business Cost of Row-Based ETL Pricing: When Growth Becomes a Liability? Row-based pricing models might look simple, but they penalise teams for growth and force engineers into endless cost-saving workarounds. This guide explains the hidden costs and shows how to escape the cycle. 🔗 Read the full article
  3. The Processing Paradox: Why Row-Based ETL Pricing Ignores How Analytics Actually Works Row-based pricing models don’t match the way analytics pipelines actually process data. This article unpacks the disconnect and offers a better path forward. 🔗 Read the full article

Found this edition useful? Pass it on!

Help us reach more data leaders who need these insights:

  • Forward this email to a colleague who’d benefit
  • Share in LinkedIn with your network
  • Join our growing community of 180+ data leaders getting monthly insights

Follow Matatika on LinkedIn | Subscribe for Insights | Visit Our Website

Connect to Apps & Data now
Use Matatika to rapidly produce insights from more than 500+ apps and community sources
Speak to an expert
Build a connector
Integrate your App or securely connect to your private data.
Learn more
Partner with us
Are you a data provider? We can work with you to publish your data.
Contact Us

Data Leaders Digest

Stay up to date with the latest news and insights for data leaders.