Discussion about this post

User's avatar
Tristan Handy's avatar

Love the emphasis on compounding at the end. All of this stuff can be a virtuous cycle or a doom loop. Sometimes it's hard to even realize how much better things could be when you're in a bad state because the whole industry is so new that most people don't have the experience of "good" in the first place.

Expand full comment
Tom Baeyens's avatar

Totally agree that observing != data quality assurance and that data quality is underrated.

Observability is great for getting coverage quickly as it is unsupervised. But it can only learn what normal data looks like. The risk is alert fatigue as unsupervised tends to find too much false positives or too much time spent on analyzing unimportant issues.

In software engineering it's common knowledge that test suites also take time to build and maintain. So I think you make a great point that data quality assurance also requires time. Data producers and consumers should participate in building up and maintain the picture of what good data looks.

At Soda, we believe that both approaches need to be combined: observability to get coverage quickly, extended with data quality. To make it easier to express what good data looks like, we created a dedicated language SodaCL https://docs.soda.io/soda-cl/soda-cl-overview.html

Expand full comment
1 more comment...

No posts