A child once had a conversation with a parent.
Child: “What’s going on with the cows on the news?”
Parent: “They’re having to cull many cows because of a disease they have.”
Child: “Why?”
Parent: “Well, people can get the disease if they eat meat from the cows.”
Child: “How did the cows get the disease?”
Parent: “They think that cattle feed makers had been putting bits of cows into their feed.”
Child: “They were making them cannibals! What’s going to happen next, are we going to get ill?”
Parent: “There haven’t been that many cases of people getting the disease, so it’s very unlikely.”
Child: “Will more cattle get ill? Will there be any cattle left?”
Parent: “Probably more will get ill, but more will be culled as a preventative measure to stop the spread, sadly.”
Child: “Can I still have McDonalds?”
Parent: “I think we should avoid it for the next year or so, until we’re sure this meat isn’t around. We’ll eat more chicken and lamb, instead.”
Child: “Sigh, OK then.”
We have to remember that, when we work in data, the whole point of our profession is to answer a question, explain the answer, say what might happen next and what should we do about it? That’s it. If you don’t believe me, listen to Abhi:
For most companies, most of the value of data is simply in answering 4 questions:
What is happening in our business?
Why is it happening?
What’s going to happen next?
What should we do about it?
That’s it.
Companies get value out of data when they are asking those questions over and over and over again and when they are constantly organizing themselves and their work around the answers. Doing so is what it means to be a “data-driven” company.
Data teams exist to make this possible. Their mandate is to make these questions easy enough to answer that companies ask them more often; and the answers clear enough that companies act in response.
If your company looks like this, your data team is probably creating real value. If your company does not look like this, your data team is probably failing.
We could probably do with getting the four points as a poster and putting it over our desks or team areas. If we lose sight of this, we’re truly lost.
A typical question flow could be: What happened to revenue last week? Why did it drop? Do we think it will stay that low and should we therefore adjust forecast? Do we need to invest, optimise or increase marketing in order to get back on track?
Part of why ChatGPT and LLMs have been so successful (and search, prior to that) is that it allows someone to ask a question and get an answer. This is how humans interface to get information from one another - it’s as old as our species, it’s innate to us to interface this way. If a machine can replicate this interface, it’s made an integration with HumanOS. Then it’s a case of making it available everywhere, in the same way that you’re never far from a Google search bar.
Where the Modern Data Stack has failed at times is in losing sight of this fact - that its primary focus is to answer questions. Data engineering is needed for sure - as you can see from the history of this substack - but we mustn’t lose sight of the goal. Our whole ecosystem of data folks, founders, VCs and consultants has to remember that our customer, our true customer, the data user or stakeholder, wants to ask questions and have them answered.
It wasn’t clear what the whole Web3/crypto movement solved to me, apart from a few use cases which probably didn’t really need solving. However, it’s very clear that being able to ask questions and get answers is a use case as old as our species. People have been so desperate for answers to questions in the past that they’ve turned to: witchdoctors, philosophers, tarot card readers, psychics, lawyers and accountants. They’ve been willing to pay big, even for poor results!
Even within tech today, APIs allow you to make requests and get responses. It’s so much a part of how we think, we’ve built our technology in this way. The ‘I’ in IT stands for information, which is an answer to a question. LLMs aren’t a weird departure or tangent, they’re helping us get back to business.
I had a great time hosting London Analytics Meetup #3 with Liron Albag of Depop at their headquarters in Farringdon!
We had some great talks.
Raoul Dias - ML Engineer at Depop on The Power of Feature Stores, including covering Tecton:
Nina Anderson - Head of Pre-sales Engineering at Infinite Lambda on Semantic Layers for Carbon Emissions Analytics
I spoke on the ‘what’, ‘why’ and ‘where’ of Semantic Layers, plus how they could and should interact with LLMs.
Whereas my talk was pretty theoretical on Semantic Layers, Nina’s was a very applied real world use case for doing real good in the world!
As much as I’ve thought about it before, Raoul’s talk brought home to me how many things are common between feature stores and semantic layers, and yet the key differences mean that either one is not best suited to the other’s intended use case. It does feel like a product maybe could or should exist that covers both sides of the Venn diagram, to prevent things being defined twice in an organisation. An alternative could be that a feature store could inherit definitions from a semantic layer and vice versa.
Sign up to the Meetup group here, we’ll be having our next event in early June. An event with details will follow.
Do you have any recordings of the talks at the Meetup? Thanks.