3 Comments

Love the emphasis on compounding at the end. All of this stuff can be a virtuous cycle or a doom loop. Sometimes it's hard to even realize how much better things could be when you're in a bad state because the whole industry is so new that most people don't have the experience of "good" in the first place.

Expand full comment

Totally agree that observing != data quality assurance and that data quality is underrated.

Observability is great for getting coverage quickly as it is unsupervised. But it can only learn what normal data looks like. The risk is alert fatigue as unsupervised tends to find too much false positives or too much time spent on analyzing unimportant issues.

In software engineering it's common knowledge that test suites also take time to build and maintain. So I think you make a great point that data quality assurance also requires time. Data producers and consumers should participate in building up and maintain the picture of what good data looks.

At Soda, we believe that both approaches need to be combined: observability to get coverage quickly, extended with data quality. To make it easier to express what good data looks like, we created a dedicated language SodaCL https://docs.soda.io/soda-cl/soda-cl-overview.html

Expand full comment

David - great perspective! I fully embrace and prescribe the "shift left" philosophy - catch and fix data problems early in the journey. Great multidimensional data observability solutions are here to help.

Expand full comment