A Song of Junk and Value

Dead Dashboards, Standard Semantics, Modular Models...

Jan 03, 2024

SPOILERS] Who else prefers the look of the Hardhome version of the Night King? : r/gameofthrones — credit: HBO - Game of Thrones

“Dashboards are Dead,” said Taylor.

Using LLMs to create more data content could generate a whole army of dead dashboards, an army worthy of the Night King… but will it?

Part of why there are so many dashboards is because they are the equivalent of the bookmarks in your browser from 1998. When it was hard to find stuff on the Internet, anything useful was dutifully bookmarked in case it couldn’t be found again. Some of you may still use bookmarks heavily, or Pocket, or a million Chrome tabs… However, that probably isn’t because you’re worried about finding stuff again - it’s more of a reminder that you want to consume the content at some point. If you want to find something you’ve seen before, you will probably just re-Google it.

Finding good answers with data is hard. It’s harder than finding stuff on the internet pre-Google, especially where an organisation doesn’t have the consistency and constraint afforded by a semantic layer. Some dashboards are true operational ones, where people look at the same metrics every week and decide to take actions, or not. Most dashboards are just analytical bookmark folders. This is why, at my last company, we regularly had 50+ tile dashboards. These dashboards would have had even more tiles if it hadn’t caused OOM issues.

Dashboards being bookmark folders is also why there are so many unused dashboards held onto tightly by stakeholders. They are afraid of losing anything, because it was hard to get them in the first place.

I recently helped family clear out an elderly relative’s house when they were moved to a care home. This elderly relative was from the generation that grew up through the war and the culture of “waste not, want not”. All resources, both during and in the years after the war were scarce, so people often held onto things we would consider waste… “just in case”. We found cupboards of old VHS and cassette tapes. We opened drawers full of takeaway menus for restaurants long closed. We found many bottles of decades-old wine in the garden shed. There were duplicates of every conceivable item, stuffed into every possible corner.

Our stakeholders are operating in the same way as this elderly relative. Data output is hard to get hold of and too difficult for them to create… so our BI folders have become a house packed full of useless old crap. In retrospect, it feels like our stakeholders have this kind of data PTSD. They need to use data to operate, but struggle to get what they need to do this (unless they are CXO and have a priority pass). They are caught between a rock and a hard place.

Going back to the Googling analogy… Once Google became the way people searched for things on the Internet and its capabilities stabilised, we got used to finding things this way, learning to “speak to Google”, learning tricks to find things fast. It became quicker and easier to re-Google something than to find it in a bookmark folder.

We have to aim for this outcome for analytical questions. Part of Google’s success is that it became the standard and the Internet has moulded to fit it: SEO, PPC, display, Chromium, Android, integrations… all drive this. Part of why this hasn’t happened with data is that we have don’t have overarching standards.

Fivetran connectors and their output are as close to a standard as we have for data pipelines. dbt has actually caused data models to become less standardised than they were before. In the past, DBAs and data architects would design a data warehouse using a methodology like Kimball. They would design raw/staging/mart schemas and build stored procedures to generate them. These data warehouse projects would be more similar across industries and businesses than dbt projects are today. By allowing agile and organic data model development, by a greater range of practitioners, dbt has led to less standardisation in data modelling.

If we have:

A standard data model per industry, that companies conform to as much as possible
A standard semantic layer format - we have called for one
A semantic layer built on top of the industry standard data model, well-suited to LLM querying

Then we will definitely be able to provide a Google-like experience for many analytical questions. We need the data stack to mould to LLMs. Even without the standard data model, it’s still possible to provide this experience but it requires extra work up front (for an industry-standard semantic layer, the LLM could be pre-trained and work well out of the box).

If we achieve this, we shouldn’t see BI folders balloon with duplication and useless old crap. We may instead see deeper pieces of analysis created with care and thought, by a human analyst… Count Canvases, Hex Notebooks, for example, which tell a longer story needing to be saved for later consumption in folders, or to be stored for future posterity in order to explain past decisions. For anything less complex, just ~~Google~~ LLM it. Don’t save it. If you want it again, just re-LLM it.

On the point of value, we also need to reach a point we didn’t get to with Google. Instead of asking “What is our CAC today?”, we need to be able to ask: “How is our CAC doing compared to recent days? How do we think it will do over the coming days if we change nothing?”, and then, over time, we need to be able to ask: “What could we do to improve our CAC?” and finally: “Optimise our CAC by managing our PPC spend. Notify me about any significant re-allocation of spend before executing.”

davidj.substack

Discussion about this post