I’ve written about the semantic layer… a lot. I think only people who build and sell them write about them more. I’ve been saying for some time that, for true value in data to be had with LLMs, we need to use semantic layers with them. I am also just one voice of many:
For bots to be successful query writers—and even harder, for them to be proper analysts that can answer questions about a business—LLMs will likely only be a small part of the solution. There will also have to be semantic models, methods for mapping vague requests onto those semantic models, frameworks for governing access control, ways to test if it said the same answer today as it said yesterday, and more.1 -
LLMs and Generative AI systems are going to need a standard interface to access our organizational data.2 - Jason Ganz
We can’t feed LLM with database schema and expect it to generate the correct SQL. To operate correctly and execute trustworthy actions, it needs to have enough context and semantics about the data it consumes; it must understand the metrics, dimensions, entities, and relational aspects of the data by which it's powered. Basically—LLM needs a semantic layer.3 -
These are just a few but there are more, that’s why it’s “we” told you so.
The two driving themes for why we need to use semantic layers with LLMs are context and constraint. We see more and more teams moving towards providing context through prompt engineering and doing things like providing the LLM with a knowledge graph to interface with. This is definitely a step in the right direction, but you also need to constrain their output.
Semantic layers provide both a knowledge graph and a constrained interface for an LLM. As a special type of knowledge graph, a semantic layer provides the LLM with a model of a world, composed of entities and their measures and dimensions. LLMs have been trained on language, and all language is composed of entities (nouns) and their dimensions and attributes (adjectives). Language also contains mathematical terms such as ‘total’, ‘running total’, ‘average’… as a result, LLMs are well-suited to knowledge graphs.
The reason LLMs are also quite good at SQL is because of how many documents and articles (think Stack Overflow) contain SQL. However, SQL is a minuscule fraction of what LLMs have been trained on. An interface closer to natural language is a better fit for LLMs, and a constrained one, to reduce chance of error and hallucination, is even better.
However, none of what I have been preaching has been conclusively proven in public. What we needed was a benchmark - this is the way that things are tested in engineering. Microprocessors, software, automotive… things that are engineered are compared with benchmarks. We didn’t have a benchmark though, until now.
In short, what we see from the benchmark and further replication is:
Semantic Layer > Knowledge Graph > Context with SQL > SQL alone
All of the engineering we have done with Delphi adds further constraint and context to the semantic layer and this is why it outperforms just using the semantic layer on its own with LLMs.
It’s great to have this benchmark, but what it has shown us is that we probably need a more challenging benchmark. One with obstacles, ones with gotchas, ones with purposeful mistakes, ones with duplication. We need to test these systems on how they deal with bad conditions, not ideal conditions. This is because semantic layers are like this and worse in production.
The inverse argument is, if data teams want to use LLMs with semantic layers, some of the uncleanliness that has been tolerated in the past: duplication, poor naming, no descriptions… needs to be improved upon. As the benchmark showed… on a clean semantic layer, with no duplication, clear naming and descriptions… it’s actually possible for an LLM to answer questions perfectly.
I’m pretty sure that if we asked a human to answer all of the questions in the benchmark, without a good deal of training on the data model, that they would get some of the answers wrong. Where humans outperform LLMs on semantic layers, at least in the short-term, is where the human has adapted to the flaws in the semantic layer - thus becoming part of it.
https://cube.dev/blog/semantic-layer-the-backbone-of-ai-powered-data-experiences