> On another note, I do believe LLMs spell the end of manual documentation and cataloging work. Relying on humans to maintain this manually, when they have time (which is never), has always failed. It’s relatively low risk, high effort content creation - perfect for LLMs.
If you're expecting to use the semantic layer as the "thing" to ask questions from, then I think that clear documentation is incredibly important. For example, at Lightdash "what does activated mean" is really different to Monzo. We have an event `activated` but that's triggered by a backend event in a very specific situation (e.g. they have verified their email account at Lightdash vs. they have made their first transaction at Monzo). The LLM doesn't know this, because it doesn't have access to this context.
Right now, the incentives for writing docs are so low that no one does it - it's really unclear which models/metrics/dimensions I should be using in the first place as a business user, so I don't even get around to reading the docs! But, if an LLM is now effectively answering the majority of data questions for your business users, then suddenly the data team has a really important task of basically building + maintaining a dataset to train an ML model on (your semantic layer + docs). So, you're going to want to make sure that the training set is as good as possible so that the ML model is as effective as possible at answering those questions.
Context is key for data questions! Either as a semantic layer, or as docs describing that semantic layer. Agreed that MOST of the docs can be written by LLMs, but they'll definitely still need human input :)
Yeah I'd agree with that. As these docs are going to be in a codebase somewhere I kind of assumed there would be a PR that someone scanned and merged.
I just don't think anyone will look after them manually to start with anymore. I'm really impressed by how gpt4 reads and summarises code in natural language, which is something we do in Delphi, so from this experience given the semantic layer is expressed as code I'd expect it to do really well with it.
Really interesting!
Not sure I agree with this though.
> On another note, I do believe LLMs spell the end of manual documentation and cataloging work. Relying on humans to maintain this manually, when they have time (which is never), has always failed. It’s relatively low risk, high effort content creation - perfect for LLMs.
If you're expecting to use the semantic layer as the "thing" to ask questions from, then I think that clear documentation is incredibly important. For example, at Lightdash "what does activated mean" is really different to Monzo. We have an event `activated` but that's triggered by a backend event in a very specific situation (e.g. they have verified their email account at Lightdash vs. they have made their first transaction at Monzo). The LLM doesn't know this, because it doesn't have access to this context.
Right now, the incentives for writing docs are so low that no one does it - it's really unclear which models/metrics/dimensions I should be using in the first place as a business user, so I don't even get around to reading the docs! But, if an LLM is now effectively answering the majority of data questions for your business users, then suddenly the data team has a really important task of basically building + maintaining a dataset to train an ML model on (your semantic layer + docs). So, you're going to want to make sure that the training set is as good as possible so that the ML model is as effective as possible at answering those questions.
Context is key for data questions! Either as a semantic layer, or as docs describing that semantic layer. Agreed that MOST of the docs can be written by LLMs, but they'll definitely still need human input :)
Yeah I'd agree with that. As these docs are going to be in a codebase somewhere I kind of assumed there would be a PR that someone scanned and merged.
I just don't think anyone will look after them manually to start with anymore. I'm really impressed by how gpt4 reads and summarises code in natural language, which is something we do in Delphi, so from this experience given the semantic layer is expressed as code I'd expect it to do really well with it.