Now is the time
If you’ve been reading my substack for a while, you might remember me writing about the two approaches to metrics layers that we’ve seen so far.
In my previous post, I described how the first approach involved creating wide datasets, without defining how a data model of multiple tables and entities should be used together. Metrics are then attached to these datasets, which came with various pros and cons I described in the post.
The second approach does define how the data model fits together. This approach, whilst not devoid of cons, is more powerful and more likely to guarantee a single source of truth for defining metrics, dimensions and entities.
When dbt Labs originally released their metrics layer (an approach 1 type), I must admit feeling disappointed. The github issue had such a rich discourse of practitioners and vendors describing what it should have and should enable - what was released felt like it tried to fulfil a lot of the secondary concerns, without preserving the core functionality of what the data community at large already expected from a metrics layer.
Fundamentally, a lot of data folks’ idea of an MVP for a metrics layer is LookML; what we were expecting from the dbt semantic layer was LookML++ - unattached from a BI tool, possibly with entities (but it’s OK for that to be in v2), with better metric governance and all transformations pushed left into existing dbt-core. I didn’t know anyone, outside of dbt Labs, who thought we shouldn’t be defining how tables and entities in the data model were joined. Yet this is what was released.
It felt very similar to how traditional BI tools like Tableau would build metrics from a single query or table, causing metric duplication and an ungovernable mess - Looker, for all its faults, was an obvious step forward to most data folks.
I remember seeing a webinar with Nick Handel and Drew Banin, which, at points, was a debate between the two approaches; it felt like a representation of the internal struggle in the dbt community about the future of the semantic layer.
At the time, I felt the semantic layer was in a difficult place. I couldn’t recommend that people start using something like MetricFlow, as I didn’t think it would become the dominant standard with dbt’s also in existence. I couldn’t recommend that they use the dbt semantic layer, as I felt it was inferior and would be replaced in time. When asked what I would use, I usually leant towards Lightdash, which lets you use dbt YAML meta to define metrics and how tables join and the format of dimensions… it was basic, but it was more or less the MVP we needed.
The Lightdash way prevented the logic in your metrics layer from being locked into a SaaS vendor’s platform. However, if you wanted to use it with anything other than Lightdash, it was very unlikely for another tool to support this format, even though it was open enough to be adopted by others.
So, this is the difficult position data folks found themselves in until last Wednesday…
The Landscape Changes
dbt Labs announces it is acquiring Transform! This announcement came with less fanfare than dbt’s Series D announcement, but I feel is probably more important. The raise provides the resources to go forward, but, with the addition of Transform, there is now a credible route to having commercial success with the semantic layer and the serving of it.
It feels quite fitting; the first episode of the AE podcast had Robert Chang from AirBnB talking about Minerva. Given the roots Transform has at AirBnB, it feels like coming full circle.
To the innovators behind Minerva and Metric Repo at Airbnb, Lauren, Amit, Cristian, Robert, Jonathan, Aaron, and more—we are so grateful for all of the ideas that you contributed to these projects. Much of our inspiration is derived from you and your brilliant work. Thank you for being amazing colleagues and encouraging us to bring this product to a broader audience.
Transform’s MetricFlow is one of the best (if not the best) standalone semantic layers for analytics out there. Now that it will be the dbt semantic layer too… the choice is simple: this is it. MetricFlow is the LookML++ we wanted, and it will be widely adopted once integrated into dbt-core with APIs and caching in dbt Cloud. Its roots are common with Minerva, which has been extensively battle-tested at AirBnB, so practitioners can be confident that it meets a wide array of needs.
We have already seen a number of BI tools integrate with the existing dbt semantic layer and this will only encourage more to do so. Looker’s advantage with LookML over another BI vendor is nullified, should that vendor integrate with this new upgraded dbt semantic layer.
Now is the time - if you’re a vendor/builder in the data consumption area, work with dbt Labs to integrate.
Things we need to know from dbt Labs:
Timelines for integration into dbt platforms etc (will this even happen or will Transform remain separate, tech-wise, indefinitely?)
What will this look like?
If someone were to build an integration, using the MQL API, would this likely continue to function for a while, post-integration with dbt Cloud, with little change?
Will the caching features and other APIs in Transform’s Metrics APIs be ported into dbt Cloud, again with little change?
I would assume that the dbt badged version of Transform will be offered through dbt Cloud; how would someone pay for this? As they do currently with developer seats (with only one required)? Would they pay a bolt-on fee or a per request fee?
Transform had started to offer some BI features, so that their customers could put their semantic layer to use more directly. dbt Labs has always said they won’t build a BI tool, so the expectation is that these features will be discontinued.
I know some of these things won’t necessarily be known immediately, especially around commercialisation, but I would imagine that points 2 and 4 would have been considered in detail during due diligence.
This is the first time we’ve had a semantic layer that wasn’t tied into a BI tool, of high quality, that could be used by tens of thousands of organisations by the end of this decade. If we as a community choose to adopt it, and now there are many reasons to do so and many less not to, we could end up with a semantic layer as universal as dbt-core is for transformation today.
Data folks, when you deal with your data consumption vendors this year, ask them about how soon they will be integrating with the new dbt semantic layer. Don’t ask them “if”, ask them “when” and how deep the integration will be.
This is as much an opportunity for us as it is for dbt Labs: we can have a ubiquitous semantic layer, independent of any cloud or BI tool, that will integrate with our tools.
Thanks for reading davidj.substack! Subscribe for free to receive new posts and support my work.
It's undoubtedly a good news for data analytics teams!
In the alternatives for semantic layers, cube.dev is a strong one and all features are available in the open source offering as well. I have recently deployed it in production, and am happy with it.
Have you heard of them? What's your view on them vs Transform metrics layer?
I missed the Transform news. Very, very interesting.