While I'm a huge fan of CDC from internal production databases to copy to the warehouse, I have a lot of skepticism around CDC from 3rd parties. In particular, a few questions I don't know the answer to:
Are we sure the vendor source data is actually structured how it's written to the warehouse, or are there transformations that happen in batch? If the latter, CDC itself won't be an end-to-end solution.
If providing CDC as a solution, how will vendors deal with heavier demand and the stress of supporting real-time systems? First, vendors will have to keep an eye on source databases and everything that comes with scaling CDC. This also has multi- vs single-tenancy implications. Second, anytime you say "realtime" there's an SLA around it. With CDC, if the SLA doesn't decrease, then there's no point on doing CDC as opposed to batch. If it does decrease, this will put a lot of stress on vendors to actually deliver which has potential financial ramifications.
Hi Sarah, thanks for the response and I hope you're well!
"Are we sure the vendor source data is actually structured how it's written to the warehouse, or are there transformations that happen in batch? If the latter, CDC itself won't be an end-to-end solution."
This is why I proposed that there could be a SaaS solution that many Vendors could use to prevent inconsistency in data treatment. Striim and Arcion are along these lines. Providing the Vendor implements directly without doing something in the middle then it should work...
I could imagine the SLAs decreasing as they monetise providing this data.
If they're using Aurora, Cockroach, Alloy etc will they have to worry about the scaling so much? I assume you mean from the perspective of additional load on the source database. Why is the multi-tenancy consideration any worse than if they provided a batch API for customers to get their data? Because they need a way to filter the transaction log ahead of the customer receiving it? Sounds like a required feature of the SaaS CDC solution needed above.
While I'm a huge fan of CDC from internal production databases to copy to the warehouse, I have a lot of skepticism around CDC from 3rd parties. In particular, a few questions I don't know the answer to:
Are we sure the vendor source data is actually structured how it's written to the warehouse, or are there transformations that happen in batch? If the latter, CDC itself won't be an end-to-end solution.
If providing CDC as a solution, how will vendors deal with heavier demand and the stress of supporting real-time systems? First, vendors will have to keep an eye on source databases and everything that comes with scaling CDC. This also has multi- vs single-tenancy implications. Second, anytime you say "realtime" there's an SLA around it. With CDC, if the SLA doesn't decrease, then there's no point on doing CDC as opposed to batch. If it does decrease, this will put a lot of stress on vendors to actually deliver which has potential financial ramifications.
Hi Sarah, thanks for the response and I hope you're well!
"Are we sure the vendor source data is actually structured how it's written to the warehouse, or are there transformations that happen in batch? If the latter, CDC itself won't be an end-to-end solution."
This is why I proposed that there could be a SaaS solution that many Vendors could use to prevent inconsistency in data treatment. Striim and Arcion are along these lines. Providing the Vendor implements directly without doing something in the middle then it should work...
I could imagine the SLAs decreasing as they monetise providing this data.
If they're using Aurora, Cockroach, Alloy etc will they have to worry about the scaling so much? I assume you mean from the perspective of additional load on the source database. Why is the multi-tenancy consideration any worse than if they provided a batch API for customers to get their data? Because they need a way to filter the transaction log ahead of the customer receiving it? Sounds like a required feature of the SaaS CDC solution needed above.
Thanks again for your questions!