4 Comments
User's avatar
Sarah Krasnik Bedell's avatar

I appreciate the premise but I think the title is a bit misleading. From what I'm gathering I'd agree that we should over-create data even if we don't know if we'll need it because as you said, you can't rewind and get data for past events. However, over-engineering from a tooling perspective comes at a cost, the cost being time to get the product shipped and then time to maintain the implementation with less value as a result as complexity grows. While I do think we should over-generate data, I'm not sure we should over-engineer the implementation.

Expand full comment
David Jayatillake's avatar

Thanks Sarah,

Do you not think creating a data model more capable than you need initially counts as over-engineering? Or implementing a CDP when you only have a handful of events? Former is in AE and latter in DE.

That's what I'm arguing for.

Expand full comment
Sarah Krasnik Bedell's avatar

I do think it's over-engineering, maybe an alternative could be creating a flexible data model that could be easily added on to for potential use cases that aren't totally clear yet.

I should also note, different companies change at different speeds. Startups change quickly, larger companies take longer to change. In the startup context, I've encountered so many situations where I overgeneralize only to have requirements change so much that the generalization I implemented just doesn't make sense any more.

Expand full comment
Bart-Vee's avatar

It's probably a bit less relevant for this discussion, but I would think that from a privacy and security perspective, the kimball model is preferable over the wide tables as it more closely resembles your business processes, and lends itself better for data management. This makes purpose limitation, and data minimisation easier, which are pillars in privacy & security. However, in practice you'll find more organisations using wide tables, and this will only increase with the growing popularity of AE. As Sarah rightly puts it, by the time you're ready with your data product it's outdated.

Again, probably not that relevant, but I love Uber's approach to privacy engineering in the new reality of wide tables quite fascinating: https://medium.datadriveninvestor.com/why-data-classification-is-necessary-b9b8e60bfe59

Expand full comment