The Modern Data Stack is Dead… Long Live the Modern Data Stack - Part 1
Where we've come from, lest we forget
The first time I heard the term "The Modern Data Stack" (MDS), was in a job interview. Dylan Atlas-Baker was the interviewer - if you don't know Dylan, he has been part of the dbt community from early on. Dylan also used to look after the London dbt community, at its start. Thanks in part to Dylan, the London dbt community was the fastest-growing in the world. It is most likely still the largest in the world. After all, we now have at least three Meetups which have roots in the original London dbt community:
London dbt Meetup, hosted by Datatonic (1400 members)
London Analytics Engineering Meetup, hosted by Dylan and host venues (2200 members)
London Analytics Meetup, hosted by me and host venues (600 members)
We now even have niche London Data Meetups taking place, as the community has grown. For example, London Climate Analytics Meetup, hosted by Nina Anderson - the first Meetup of which I am travelling to as I write this. Even though it is the inaugural Meetup, it's oversubscribed!
Previously, Dylan interviewed me to be his replacement at a UK FinTech called Growth Street. This was around Christmas 2018. At that point in time, my goal was to lead data at an organisation where I would have more freedom with tech choices. Dylan was looking to move into MDS consulting, and hence was leaving Growth Street. I wonder if either of us, at the time, would have thought that we would become MDS founders...
One of the things Dylan asked me early in the interview was: "Have you heard of the Modern Data Stack - Snowflake, dbt and Looker?"
I said that I hadn't, but I'd heard of Snowflake and Looker. I asked: "What's dbt?"
Dylan described dbt and then the interview went on as you might expect. I passed Dylan's part, but then didn't make it past the final stage. This actually happened again soon after, when Dylan was consulting at Receipt Bank!
I look back and wonder if I would have had the opportunities I’ve had to speak, write and found, had I not heard about dbt and the Modern Data Stack as early as I did. It was inevitable that I would have learned about dbt eventually, and probably started to use it. However, it does feel like those of us who joined the community in 2019, and before, have had a boost of some kind or another. A disproportionately high number of us are founders, early team members at MDS vendors or are leading voices as practitioners.
Dylan went on to found Spectacles with Josh Temple, and, as you know, I went on to found Delphi with Michael Irvine. Perhaps Dylan's story of founding with Josh, after meeting him only once, helped us found whilst only knowing each other from the dbt Slack!
Catch Dylan, Michael and I on the Spectacles Podcast next week:
https://www.spectacles.dev/webinars/natural-language-questions-and-looker-with-delphi
The main takeaway from my interview with Dylan at Growth Street, was that it sparked an interest for me in dbt. I went home and researched it, read the docs, saw the ref() function... it clicked. This was the tool that I needed to help me get away from MSSQL Server Stored Procedures, or from rolling my own transformations using string manipulation elsewhere!
Some have called dbt a "gateway drug to git", but it's not how I see things. Using dbt was the first time I had that zen flow that software engineers have in their IDEs: build, test, fix, push and repeat.
As I mentioned before, when I was interviewing at Growth Street, I was looking for a way out of legacy stacks. I could see that technology like Snowflake and Looker was the future, before I'd even heard the term MDS. dbt was the glue to hold it together in a warehouse-centred data world.
Using MDS tooling, I was able to build and maintain so much more than I ever could in legacy systems like MSSQL Server or Oracle. Hadoop took so long to be deployed at the company I worked at during the "Big Data" era, I managed to miss it entirely.
The first MDS I built and used was Snowflake/dbt/Looker. For the first time, I could actually make everything I needed work on my stack!
As a team, we could express all the transformation we needed with dbt
Snowflake could concurrently handle ELT workloads and BI workloads from Looker
With LookML in Looker, we were able to define metrics and dimensions once for our whole company to use
Not everything was perfect:
We had to do some janky things in dbt, as we progressed in complexity
Snowflake, whilst capable of nearly any workload, comes at a cost - which soon became something new to manage
Even though we had "self-serve" with Looker, not everyone in our org could use or access it. It had a high bar in skillset and limited seats
Whilst everything wasn't perfect, we were hugely more capable than we would have been on previous technology.
Our ability to iterate and deliver quickly, our ability to serve a large number of users, our ability to handle complex analytics... All of these abilities would have been diminished without the MDS.
Many are saying the Modern Data Stack is dead. This is partly because of the imperfections I mentioned above. Phrases like: "The Modern Data Stack exists to increase Snowflake credit spend". Yes, I have seen the sharp end of that idea. However, after my initial migration onto Snowflake, I managed to make it run cheaper than the previous data warehouse.
It was only after people realised that things would actually run on Snowflake (or Looker, as they knew it), that they started to use it more. They used it more and more and more. Engineers, Product Managers, Finance people, Marketing Folks, CXO... You name it, they were sipping on Snowflake Credits through a straw of their choice. This is actually a good thing! More people doing more things with data should result in ROI for a business - as long as it's not wasteful, which, for the most part, it wasn't.
Engineers began to use dbt and Snowflake for their data pipelines, as it avoided the horrors of AWS Glue. Every single person that touched data in our organisation became more productive with data, because of the MDS.
You might say the MDS is dead, but I'm not going back, I'm going forward. The MDS is dead, long live the the MDS! Let's enhance it and focus it. We know, in terms of capability, that the tools we have are powerful. However, we need to focus on our customers' use cases and their ROI. Can the Modern Data Stack evolve in this way?
Can the Modern Data Stack become easier to use, so we spend less time and money engineering?
Can the Modern Data Stack be cheaper to use?
Can the Modern Data Stack get closer to value creation and customers?
In the next post in this series, I’ll revisit the diagram I made in this post and consider how it has changed and how it should look in the near future.
You're capturing an important nuance that is missing in the dialogue. Is the MDS better than past approaches, absolutely. Can the MDS be better than it is today, absolutely. Those things are not mutually exclusive.