It’s been a bit of a crazy week - I’ve been out the last three evenings in a row, had my kids home for half-term week and an eventful week of work - so I’ll keep it brief!
Yesterday evening, I attended London Analytics Engineering Meetup #11 organised by Dylan Baker of Spectacles and Niall Woodward of Select.dev. GoCardless (the spiritual home of data contracts), kindly hosted the event. It was great to be back again after hosting a London Analytics Meetup there last year.
There was definitely a streaming theme, with two of the talks heavily geared this way.
As I have mentioned before, I am an investor in Streamkap, founded by Paul Dudley and Ricky Thomas. Streamkap offers real-time, cost-effective streaming of data from a database (commonly MySQL, Postgres and Mongo) to a data warehouse like Snowflake or BigQuery.
One of the reasons I invested, apart from knowing the founders, is that I believed that streaming ETL should be easy and affordable for data folks. I had found out anecdotally that Fivetran charge a princely sum for doing CDC at scale, and still writes micro-batches to the data warehouse, causing lag. The promise of streamkap was, in theory, that it would be faster and cheaper than Fivetran for CDC ETL.
The third talk of the evening was from Marcin Migda, of SpotOn’s data engineering team.
I had met SpotOn’s Data Engineering lead, Ben Cohen, at a Metaplane dinner at dbt Coalesce 2022 in New Orleans. We got talking data stacks, as was inevitable, and when I heard about their usage of Fivetran, I suggested trying Streamkap - it’s amazing to see the impact of one introduction over a year later!
Marcin went through the data architecture they used to have before using Streamkap. This architecture and configuration resulted in a 20 minute data update lag, which had real customer detriment. SpotOn provide payments and point of sale services to tens of thousands of restaurants. Restaurant management and staff would have to wait up to 20 minutes after end of service for tip payout etc because of the lag. This is partly what was attractive about using Streamkap for future architecture, as well as Snowflake dynamic tables.
Marcin gave a great explanation about Snowflake dynamic tables - their strengths, limitations and how to use them in dbt. Streamkap can pipe data into Snowpipe and then Snowflake dynamic tables, in turn, can then keep data up to date on a set interval.
Finally, the new architecture with Streamkap in place reduces the update lag to under 5 minutes, almost all of which is the update cadence on the Snowflake dynamic tables (Streamkap delivers data from the source db to Snowpipe with sub-second latency). The 5 minute update cadence could be reduced at further Snowflake credit cost. This is Streamkap delivering on the vision I invested in, for a great customer - saving them money, time and complexity.
A real-world, real-time analytics use case that drives value for customers, using cutting edge data tech. What more could you want from a Meetup talk?