The semantic layer came of age over the course of two weeks at the Moscone Center. Both Snowflake, with semantic views, and Databricks, with metrics views, have essentially implemented a semantic layer with a compiler, which I predicted (around this time last year) that they would do before the end of 2025.
Back then, both Snowflake and Databricks had implemented ways of codifying a knowledge graph, which looked very much like semantic layer YAML, except that this YAML was not compiled in the same way as with a semantic layer1. It was provided as context in the prompt so that the text-to-SQL methods they were employing could perform better. They did indeed perform better, but as I wrote about before, they still didn’t come close to the text-to-semantic layer method I had proposed and that has been implemented at Delphi and Cube. They never will come close to the text-to-semantic layer method and, as a result, will never have high enough accuracy to succeed. This is why I predicted that Snowflake and Databricks would change tack in 2025, and indeed why they have done so.
A Darker Truth
It was about a year ago that I started the pre-seed fundraising process for Delphi. All of the VCs I spoke to asked me why we would succeed vs all the other entrants in the space. The vast majority of these entrants were using a method called “text-to-SQL”,…
The driver for this change is agentic analytics. When the idea of headless BI, or a universal semantic layer, first made its way into the zeitgeist of data folks, it was meant for use by BI tools and data applications. It may have originated in natural language, like with Cube’s precursors Statsbot, Metabase, Thoughtspot, Veezoo…, where strict syntax requests disguised API requests as natural language. However, the destination was all of the other BI tools and data apps out there. A single source of truth, governance, performance… not specifically AI, as AI was still a dream in this pre-GPT-3 era.
As AI has captured a greater and greater mindshare, till the point where we are now where it’s nearly at 100%, any delays to making data agents can no longer be tolerated. It’s not a case of will they or won’t they have a semantic layer - any company that wants to offer agentic analytics must have one. Databricks, Snowflake, GCP and Microsoft all now have at least the rudiments of a semantic layer with a compiler. Text-to-SQL is dead - dead on arrival. No-one is better placed than a data warehouse vendor to make a success of text-to-SQL as I have explained before, and even they have all given up on the idea and moved to having a semantic layer with compiler. It’s fundamentally not viable, because LLMs aren’t magic.
SnowPilot ❄️
Earlier this week, I ran a Cube webinar where I explained the difference between text-to-SQL and text-to-SL (Semantic Layer) methods for self-serve analytics.
If a really smart person can’t just take a data model with context only and deliver consistent and accurate results, we won’t get LLMs to, even with GPT-5 level of power. It’s not a computational problem; it’s a metadata and guardrails problem. You don’t leave things to chance when you know how they should be. You force the system to conform.
However, one consequence of all this is that headless BI is also dead. Agentic analytics is seen as the new BI, the successor to BI, the BI we always wanted. If every headless BI offering also includes agentic analytics, and they will do so by the end of this year2, then are they really headless? Surely it could be correctly argued that they now have a very smart head! Maybe it’s only connected by a thin MCP sinew, and you can still flip it off to connect to other things like Tableau, PowerBI and Chart.js… but that still doesn’t let it join the headless hunt.
In all honesty, BI tool vendors have been terrible partners to headless BI vendors. They haven’t had much incentive to integrate well… it makes them more replaceable. So I guess it’s the end of the period of headless BI, and the beginning of the era of agentic analytics. Where the previous headless BI vendors have a big product and technology advantage through their semantic layers, and BI vendors have an advantage in their advanced commercialisation and existing relationships to leverage. Let’s see who prevails… perhaps this increases the gravity of the data warehouse further if they can offer agentic analytics directly and bypass the consumption layer entirely.
Gartner is even preparing the way for an agentic analytics quadrant to exist:
BI has never been more competitive. Every data tool really is becoming a BI tool. That includes data warehouses in this new agentic analytics era.
Every semantic layer is a knowledge graph, but not every knowledge graph is a semantic layer. I know that the term ‘semantic layer’ is a bit loaded with many meanings. However, for the purposes of this blog, I always mean a metrics-style semantic layer where the entities of a data model are codified with their attributes and join paths to each other, allowing for simple requests to be compiled to consistent, complex SQL easily. This has been called headless BI or a metrics layer in the past.
All of the ones I’m aware of offer this to a greater or lesser extent now. So it may be that they are just in the process of improving and enhancing their agentic offering going forward.
Great post. I view it as largely inevitable that more and more of the data stack gets absorbed into the DW. Look at Snowflake now - they announced a Fivetran alternative, dbt support, etc.
Will it be as good as specialized tools? No. But will it be good enough to capture majority of customers who don't want to deal with multiple tools? Probably!
Still a market for other companies but a smaller share than before (although maybe the market is growing?)
Hi David,
Indeed, I have long thought that we will have nearly "headless" BI. Only problem was, no one believed me or took me seriously about that. People have asked for "more dashboards" for the last 30 years. In the mid 90s I predicted that "more dashboards" would not end well. I said people will develop dashboards merely because they think they want them rather than answer the questions that need to be answered to make more money. As per usual, I was right.
I have been in BI 34 years now and I have still not seen a product that is able to make more money for commercial companies than Ralph Kimballs Data Interpretation System by Metaphor. Ralph was a co-founder there.
My prediction is that our whole BI industry will eventually admit our mistakes and go back to using what is now called Meta5. With Meta5 you can now run it as a "black box". So what you can do is have "agents" of some kind searching through large databases looking for the conditions that it has been programmed to report on or thinks should be reported on. It can then send the data that is needed to create the report to a table and that data can be read by Meta5, the data sent to Excel, and the report sent to One drive or email.
So what does this mean? It means the most complex analysis that can possibly be written in any product that accesses data and sends the results back to a table can be written. And then Meta5 simply comes along as a "black box", picks up that data, and sends it to whoever should know about it.
Meta5 can also allow business people to directly query any data they have access to. No need to get in the IT Q. This was the number one feature of Meta5 in 1991 when I first used it. The business people did not have to ask the IT people for any help to answer their own questions. Every company I talked to who used Meta5 told me the number one feature of the product was cutting IT out of the process of answering a question. So you can see why I thought "more dashboards" was not such a great idea.
Now, there has been one more thing invented. And I think this is a game changer. An old pal of mine invented the idea of "pseudo dimensional models" over large operational systems. So take Business Central as an example. The link is below. It is possible to put views over a large operational system to present the data as a pseudo dimensional model.
Sure, that's not perfect. But it is cheap. So any large operational system, or a direct replica, can have pseudo dimensional models placed over the top and then hand coded, or tool coded, or LLM coded SQL can read it with a low probability of incorrect data being returned. Not zero. But low.
My own opinion is that this idea of putting pseudo dimensional models over the top of large operational systems, or replicas, is a MUCH bigger market than the DWH marketplace because smaller companies can afford this because it's pretty much free.
The other thing that has been invented is this. With Meta5, and only Meta5 as far as I know, you can store SQL in the cloud. You can read it to the Meta5 desktop/server, you can perform parameter substitution at run time, you can then send the SQL to the data sources to collect the data. And then you can send the results to the user. For large operational systems this means you have one place you maintain all the SQL code and that SQL code can be used by hundreds, if not thousands of companies who use that large operational system.
Take for example Business Central used in retail. And take for example a region managers suite of reports he wants each morning. And consider there may be 10 companies with 20 region managers each in them for 200 region managers who all want their reports at 9am. With Meta5 you could have ONE VERSION OF THE SQL for those reports in the cloud. Meta5 could be installed on prem for each customer. It would read the ONE VERSION of SQL, edit it for each company for each region manager, send it to the right BC, get the answers, and send each individual spreadsheet to each region managers one drive. And in all that? The only piece of software that is not already owned by the companies is Meta5. And that is cheap now too. Of course "direct query" is not perfect and it's better to have a data warehouse. But did I mention it was cheap? LOL!
Here is an example of such views.
https://bida.ro/2024/06/23/bida0050-dimensional-models-over-business-central/