Golden Leapfrog

Why build for legacy BI if LLMs will cannibalise it?

Dec 05, 2023

A few weeks ago, I shared some survey results about BI tools and semantic layers that I got from surveying London Analytics Meetup. I have received more responses in the meantime, as people have signed up to the group and the subsequent Meetup.

The splits haven’t changed a great deal. Tableau and Power BI are very dominant and I think this probably reflects the market. London Analytics Meetup attendees are fairly skewed towards the Modern Data Stack, so I would expect fewer of them to use Power BI and more to use Looker/Lightdash/Metabase/Preset/Hex than the market. Therefore, I think the Looker share shown above is very inflated and also likely contains a great deal of Looker Data Studio users, given how few said they used LookML below.

Even in this Meetup group, which is skewed towards using a semantic layer because of the popularity of Looker, the vast majority don’t use a semantic layer. I expect that the “None” segment in the market would be over 90%.

One thing I’ve noticed when talking to teams who use PowerBI, and from the survey results, is that even though it has a semantic layer in the form of SSAS/MDX… very few actually use it. It is used more like Tableau, where SQL queries are written for each workbook - so the semantic layer isn’t hugely adopted, even where it has been bought and could integrate well. There could be many reasons for this:

Lack of skill to use the semantic layer - many people who can use SSAS/MDX tend to have passed a qualification of some kind.
Lack of time - the data team may be firefighting and not have the time to invest in this kind of infrastructure.
Lack of awareness - I’ve spoken to multiple data team leaders who didn’t know what a semantic layer was.
Concern about lock-in - the semantic layers attached to the BI tools above all lock you in to those tools.

The fourth point above is mitigated by using a standalone semantic layer like the dbt Semantic Layer, Cube or AtScale. However, one big problem faced by standalone semantic layers like these is that the big BI vendors like Tableau and PowerBI make little to no effort to integrate with them.

Part of the reason why standalone semantic layers are not supported by the big BI tools and why the likes of AtScale and Cube either pretend to be SSAS/MDX or a database, is that big BI tools don’t want logic leaving their platform and eroding their moat. They already have as big a market share as they can reasonably have - allowing organisations to move their logic to a semantic layer and perhaps even dashboards-as-code there, can only reduce their market share. They’ll never do it unless the semantic layer in question got too big to ignore (incidentally, this is why Tableau does support MDX), and by deciding not to support standalone semantic layers in general, they make this scenario less likely.

So, all in all, I don’t hope for much integration of standalone semantic layers by the big BI vendors: there is no incentive for them. As the graphs above show, they have managed a great deal of commercial success without adopting their own or other semantic layers.

There are some large enterprises who see a standalone semantic layer as the solution to problems like single source of truth and place of definition/access - this is where AtScale have succeeded, by meeting them where they are and being SSAS/MDX in disguise, while also allowing other types of consumption. This was the right strategy before ChatGPT took the world by storm and turned our idea of what was possible upside down.

LLMs change everything in the BI market

18 months ago, no-one thought Natural Language would definitely be the interface of analytics for the future. Thoughtspot is not dominant in the BI market, they didn’t take the BI world by storm with their NLP interface. Neither are they new to the market - even if we pushed the era of ChatGPT and LLMs out 5 years from November 2022, they wouldn’t have displaced either Tableau or PowerBI from first and second spot on the BI leaderboard. They were founded three years before PowerBI’s initial release.

This is not to take anything away from their success, but they were competing with the likes of Looker, not the big boys. My point is that, during the years that ThoughtSpot has existed, at no point did Tableau or PowerBI seriously bother with an NLP interface, due to its perceived threat. This is even with the big two making acquisitions in the space and not bothering to release them. NLP on pre-LLM tech wasn’t a threat to existing BI tools.

[Before the current LLM era, Sigma was looking much more like a threat to Tableau and PowerBI. Sigma isn’t particularly exciting, but they have adopted an enterprise product and sales approach that would have likely allowed them to claw their way upwards to compete for a large market share. They built all of the features that a Tableau or PowerBI org would need to be willing to switch - that, together with strong enterprise sales, makes a very sound commercial strategy.]

Compare this adoption to the impact of LLMs - all major BI tools have either released or confirmed they will release an LLM-enabled interface. Tableau has already released TableauGPT, AWS released Q, which works with Quicksight, at ReInvent last week, Microsoft have invested over $10 billion into OpenAI and have embedded it into all of their main products (including PowerBI) already. NLP on LLM-tech is the future, it’s too big for anyone to ignore, even if you already dominate the market. It’s adopt or die.

I imagine this ‘Q’ is who AWS Q is named after, although this Q wasn’t an AI, it was a sentient super-being with questionable intent - a worrying naming choice! I’m not sure you would want this Q running around your cloud infra - the Donkey Kong of chaos monkeys!

The big BI vendors can’t continue to own the market like they do today, with other tools using AI to disrupt them. They can’t avoid bringing AI into their products, even if it means some cannibalisation of current business. All of the paradigms of BI, like dashboards, scheduling, saving, workflow, interaction with the data team… are now up for debate. They most likely will preserve market share by adopting LLMs in the short to medium term, but they should expect some disruption in the medium term and even greater disruption in the long term.

Let’s, conservatively, assume that this disruption will lead to 10% of BI market share going to other players (some who don’t even exist today) in the next 10 years. I say conservatively, because 10 years ago the BI market was tiny compared to today: Tableau was only just making strides to replace Excel as the way to present data back then. Power BI didn’t even exist 10 years ago and now seems to have about half of all market share - “There are only two ways to make money in business: bundling and unbundling” — Jim Barksdale, ex-CEO of Netscape.

I found some BI market growth projection numbers from 2021, where nothing would have been baked in around LLMs:

Blue (Total) = Yellow (Legacy) + Red (AI)

The graph above shows a model using the projection numbers and my assumption from earlier: that in 10 years’ time, AI-first experiences will have 10% of the market share. The effect of this cannibalisation on the value of legacy BI is profound: from a reasonably healthy 8.7% CAGR, to slowing growth. The advent of new technology can commoditise the previous generation.

As I mentioned in my last post, it has been proven that a semantic layer is essential for a great AI-first BI product. These new vendors could be much better partners for standalone semantic layers than legacy BI vendors ever were in the past, who have been very commercially successful without much semantic layer adoption. AI-first products won’t be successful without the semantic layer. These products then have the choice of building their own semantic layer, that is made to work with their interface as best they can build, or to partner with existing semantic layer providers.

If standalone semantic layers gear their product development towards AI-first products rather than legacy BI tools, they can sail with the tailwind of being “picks and shovels” for the AI gold rush. AI-first BI companies may favour buying over building - after all building a semantic layer is not easy. If you could partner with an existing semantic layer, who released features you needed faster than you could, because they’re doing it for many organisations, with more engineers, it would be a very reasonable choice to make.

In this way, standalone semantic layers would have a more equitable share of value from the market share of AI-first products than they ever did with legacy BI tools. These tools would still be growing fast and not at their maximum market share, like legacy BI tools are.

This is the same graph as before, but with the value of AI being 20% of the market instead of 10%, in a decade. This scenario is still conservative.

The projected value of standalone semantic layers in this scenario is much higher, as the future growth is much higher and share of revenue is much higher. Today, the vast majority of the value of semantic layers is captured by BI tools, who bundle them with their products.

Standalone semantic layers capture a tiny share of the value of the market today - let’s be generous and say that this is $0.5bn in total. Let’s say that, if they try as they might with product and sales to have higher penetration into the legacy BI market, and therefore larger share of value, that this is worth $2bn in total in a decade from now… again, very generous - they are doubling their share of value, given legacy BI roughly doubles in value in both graphs above. This is hard graft too, the legacy BI tools will provide them with no help at all, as mentioned earlier. We could assume that this focus on product development towards legacy BI forces AI-first vendors to build their own semantic layers, as they rush to improve their experiences and go to market. Total value of $2bn, all from legacy BI which has slowing growth.

Now let’s imagine that product development is geared towards AI-first vendors instead, and a good share of them opt to use the standalone semantic layers instead of building their own. Depending on the graph above, standalone semantic layers could have $1-2Bn share of value of the AI segment in the market by the end of the decade. Let’s also say that this results in $1.5bn share of the legacy BI segment of the market, a reduction of $0.5bn. They can still grow their share of legacy BI value with enterprise sales alone - it’s a powerful way to grind out results, as I’ve seen in a few businesses. Total value of $2.5-3.5bn, where $1-2bn is in a high growth segment.

Even conservatively, the second scenario plays out well. VCs want to invest in exponential growth segments - the legacy BI segment grows in a way that is more commonly associated with private equity returns than VC.

This is still assuming the size of the market remains the same as in previous projections. As I and others have written about before, this assumption is most likely wrong and it can only be wrong in one direction… AI-first experiences will lead to higher consumption, higher value, higher revenues and a larger future market than expected. This could all still be while cannibalising not just market share of legacy BI, but $value, too. It could also happen a lot faster than over a decade.

I have also assumed that legacy BI tools won’t be very successful at harnessing LLMs and AI in order to modernise. Usually, when vendors reach maturity and they have an existing business to maintain, which is a cash cow providing returns for investors, they really struggle to launch new products well. It’s very difficult to launch new features without “upsetting the apple cart” - this is why the legacy BI vendors have looked more or less the same for years, with minor incremental improvements. They can’t sell their seats for much less than they do now without entering a death spiral. The story for them is that AI will allow them to sell their seats for even more.

If prices increase for Legacy BI vendors who adopt AI, this leaves the door open for new entrants, who, by going AI-first and hugely saving development time and cost, can disrupt the market.

The mobile phone is an example of a “leapfrog” technology: it has enabled developing countries to skip the fixed-line technology of the 20th century and move straight to the mobile technology of the 21st.1

LLM interfaces could replace legacy BI tools as the default way of interacting with data within the next 10 years. New businesses of the near future may not even experience legacy BI tools. Existing businesses who haven’t modernised yet may skip current generation technology and leapfrog them to AI2. I can’t remember the last time I touched a landline receiver - I don’t even have one connected in my house.

https://en.wikipedia.org/wiki/Leapfrogging

https://redmondmag.com/articles/2023/07/14/ai-from-the-trenches-part-1.aspx

davidj.substack

Discussion about this post