A Darker Truth

The final reason text-to-SQL will fail

Mar 07, 2024

Credit: Dune - Part 2 (Legendary Pictures - Warner Bros). At cinemas now, it’s awesome, go see it!

It was about a year ago that I started the pre-seed fundraising process for Delphi. All of the VCs I spoke to asked me why we would succeed vs all the other entrants in the space. The vast majority of these entrants were using a method called “text-to-SQL”, where an LLM is given the database schema and asked to generate a SQL query to answer a question. This has conclusively been proven to be inferior to using a semantic layer with an LLM for the same purpose. When I say inferior… it’s not even a close call. It’s kind of like Space Jam without LeBron or MJ to help the Looney Tunes.

This is the light reason as to why text-to-semantic layer systems can work in enterprise, but text-to-SQL ones won’t. The nice, fair, technical reason.

We told you so

David Jayatillake

December 1, 2023

Read full story

Some of the VCs would argue that, in time, with enough power, enough pre-training, specialised pre-training, a hundred trillion parameters, neuralink, AGI… infinite time and resources, effectively, that it would be possible for text-to-SQL to perform well. I’m still not sure that’s true, as LLMs simply aren’t magical. If the metadata isn’t there to know how to use the data… why would it be able to know how to use the data? If you put me in front of an awful schema with no useful metadata, I wouldn’t be able to write a query to answer a question. Put me in front of a perfect but unknown schema with good names and descriptions and I might get it right, but I would be guessing how to join the tables and aggregate fields etc. I know you can provide metadata to the LLM to know how to use the data, but the end conclusion of that path is to move to using semantic layers 🤷‍♂️.

However, if you really trend towards infinite resources, then sure, it’s possible that you might create an AI system capable of answering questions on an unknown, unclean schema. Let’s assume, with infinite resources, that it would be able to do it perfectly every time, thus eliminating the need for a semantic layer.

After I got to this point in conversations with VCs a few times… a dark realisation occurred to me, one I haven’t really uttered outside of these conversations, until now.

I’ll share what it is next week… 😂

Joking aside, when I did share this with some of the VCs I was pitching to, I could see that some were shocked and appalled by it.

I realised that, even if you could create a perfect text-to-SQL system, it wouldn’t be good enough. Yes, even if it answered correctly each time, it wouldn’t be good enough. The VCs, especially the ones who hadn’t worked in data, were perplexed.

Why wouldn’t it be good enough?

If you’ve worked in data for long enough, especially at the interface of how senior stakeholders use data for decision making, you’ll realise that correctness isn’t all that important. When stakeholders see data, it’s more important that it fits their view of the world, and if it doesn’t robustly and deeply explain why it doesn’t, they will reject it (and still might even reject it).

We sometimes think of company leadership as almost like a rational collective human computer to optimise company performance, but the vast majority of individual companies don’t really work like this. We sometimes conflate how the market, and I mean capital “M” market, works and how companies work. Yes, the market can often function like that collective computer, self-optimising etc… just as often as it can behave like a flock of frightened sheep running headlong into a ravine. Company leadership behaves much more like a group of primates. Yes, they have better manners and lower violence, but all the mechanics and motivations are the same.

Startups do seem to be more able to escape this through making everyone an owner, therefore allowing the greater good to be visible to all. This makes them more like a pack of wolves than a group of primates, able to take down problems much bigger than them. This is mostly impossible, or only temporarily possible, at larger companies.

https://commons.m.wikimedia.org/wiki/File:Wolves_hunting_elk.jpg

If you watch this documentary on macaques and see a similarity to how your company’s leadership operates, the OpenAI leadership debacle… that’s because it’s incredibly similar.

Credit: BBC - Dynasties II - Macaque - Monkeys in the Mountains https://www.bbc.co.uk/programmes/m001gp02

Back to humans - but keep the macaques in mind - there is a big issue with delivering any information, including data, to senior stakeholders: blame, if it’s wrong or perceived to be wrong. If we had an all-powerful text-to-SQL system and it came up with answers we didn’t like or agree with, it would be very difficult to understand why it had come to them, as no human was involved with how it came to its answer. It is acting as the semantic layer, as well as the answering system. It’s deciding what the data means and how to use it, then generating answers based upon this - Judge, Jury and Executioner.

The problem with blame in this context is that, while you can assign it to an inanimate object like a text-to-SQL system, there are additional side-effects that can’t be resolved properly. With blame in primate=human society, humiliation follows. Whoever is wrong is required to be humiliated, to protect their associates from also having lowered social standings. They need to be ostracised, have their hind leg proverbially gored and sent into social exile. This protects the group from relying on the wrong member again and makes everyone else feel safe - they have their scapemonkey.

Often, ostracised primates die, and this drives a fear/survival response in them to avoid this, through risk aversion or fighting. Humans have this same response, even though there is rarely a risk of death in modern society, let alone even being fired. The real risk that is most common is being looked over for pay rise or promotion, but humans acting as primates in the office respond to this with the fear/survival response. You see it all the time in big workplaces - people who are too afraid to take risks or people who charge about the place shouting, beating their chests and bullying others.

This, again, is one area where startups can be different - as they look to “move fast and break things”, mistakes are seen as inevitable and things to learn from. Fail fast, early and often to maximise learning - if you hunt every day, it doesn’t matter if any one hunt goes wrong.

The blame and humiliation must happen. Even though the blame could be transferred to the text-to-SQL system, the humiliation will remain with the stakeholder who used it. There is no way they will ever sign up for that, they barely want to be responsible for using BI tools today. Asking them to be responsible for the SQL a text-to-SQL tool generates, and bear the humiliation for it likely being wrong, is straight up worse than asking them to learn SQL and write their own queries.

In most companies today, if data presented in the boardroom is wrong, an exec sticks their head out the door and asks someone to come in and explain it. Have you ever been that person? I have. That person is the scapemonkey, unless they can pull it out of the bag (usually you can do this, because you point out the exec has the wrong filters applied on the dashboard - blameshift). With LLM on semantic layer systems, this is still possible, because the data person to blame and humiliate will be the analytics engineer who is maintaining the semantic layer (note - they may be acting as the analytics engineer with a different job title, so the company can save 20 to 40k).

This is the dark reason why text-to-semantic layer systems can work in enterprise, but text-to-SQL ones won’t.

If you feel icky and disgusted… good. It probably means you’re a decent person. I’m not advocating for any of the behaviours above - it’s just the way it is and always will be while humans are in charge of things (this is one nice thing about the idea of AI running companies, instead of people). I’m also not saying all people behave like this in big companies, it’s just that a sufficiently large number of them do (especially at senior levels) to enforce the dynamics. Startups are not immune from this either - when this happens, they say: “it was a bad fit, we hired a big company person by accident.”

Thanks to Pete Soderling for emboldening me to share this 😅.

On that note, Data Council’s line up this year looks as good as ever:

Wes McKinney of Posit (and Pandas fame)

Nick Schrock of Dagster

Benn Stancil of ThoughtSpot

George Fraser of Fivetran

Lindsay Murphy of Secoda

Abhi Sivasailam of Levers

Monica Miller of Starburst

Jake Thomas of Okta

Caitlin Colgrove of Hex

Tony Kau and Paco Valdez of Cube

… plus many more!

It’s really a great line-up of friends and contributors to the data space. You can get tickets here.

If I’ve traumatised or offended you with this post, feel free to DM me with this button!

Dan Goldin

Mar 8, 2024

This reminds me a bit of the self driving debate. AI might be safer than humans but because it's a robot we still prefer the human approach, even if overall it's more dangerous.

I do think there's something to the human element and that no one wants to take the fall but as companies find the value they will invest more in data quality and I suspect more and more will be handled upstream which should simplify the NL2SQL world. I think the semantic layer still needs to exist but I'm not sure where in the stack it will end up living.

Expand full comment

3 replies by David Jayatillake and others

3 more comments...

davidj.substack

We told you so

Discussion about this post