I’ve spent a lot of time and had a lot of fun writing this series while learning sqlmesh, as well as creating a new CLI command and package!
The final week of this series wasn’t something I had planned. I thought I would be learning about sqlmesh features for the entire three weeks, but using Windsurf accelerated my learning and building process and inspired me to explore the limits of what is possible.
Considering that what I’ve built this week is somewhat of a tangent, I want to return to the main focus of this series, which was to learn about sqlmesh, compare it to dbt, and draw conclusions.
In short, if you use dbt-core and run it yourself in Airflow or in a container on cron etc, there’s no reason not to switch to sqlmesh, and there are many reasons to do so.
The fact that it’s backwards compatible with dbt means it can’t be ignored. I know that learning a new framework is a big deal for most data folks. You are under pressure to deliver instead of trying new tools, but you can even run an existing dbt project using sqlmesh and continue to keep building dbt models if you don’t want to learn the sqlmesh way. You get the benefits of virtual data environments and the sqlmesh plan/apply workflow, which are substantial for very little lift.
Then, when you have time, you can try the SQLMesh model kinds and see that they are not so difficult or different to use.
After using it extensively over the past few weeks, I find that the transition from dbt is quite easy. The leap from dbt to sqlmesh is much smaller than moving from stored procedures to dbt.
The thing to understand is that it does a lot more than dbt; it looks after environments and the analytics engineering toolchain more comprehensively.
There are only two real considerations when using it: how to run it in production, and by this, I mean on a schedule1, and where to store state.
Where to run it on schedule is up to you. There is a native Airflow integration, but I personally would not use Airflow. You can also use Github actions, which is what we’re doing at Cube for the moment. It has the nice benefit of being where code changes happen, so you can use the same method for production runs and CI/CD, which are obviously linked. I’m sure you could also use your cloud provider’s container run service and a long list of other ways to run Python on a cron schedule. Tobiko Cloud will also offer this feature soon. This concern is not specific to sqlmesh; it’s something everyone who uses dbt in production has to deal with too.
Where sqlmesh stores its state is a new consideration - it can be either in your data warehouse, in a separate database of your choice, or using Tobiko Cloud, where this is managed for you. Depending on your data warehouse, storing state there might not be ideal, as consistent transactions are needed. Some data warehouses can support this; however, an additional concern is that a cluster must be active to enable read/write access to this state, which can be costly.
Imagine if you’re running sqlmesh plan
frequently during development, which will read from the state. Using a data warehouse like Snowflake would require keeping a warehouse active for a few minutes each hour of development time. This isn’t a significant cost for a small warehouse, but it is still not ideal. Utilising Postgres or another OLTP database elsewhere would be more economical, but it also adds complexity2.
I plan to revisit Tobiko Cloud in a few months. Based on some of my feedback, they are releasing new features and will release a significant amount more early in the new year. After that, I’ll evaluate the pricing and provide my opinion.
I think that’ll do for 2024 👨💻. Have a Merry Christmas and a Happy New Year! 🎄🥳
This is not to be confused with SQLMesh’s scheduler, which uses the state of the time intervals processed for a model and its @cron to determine what data needs to be processed and on which run of the day.
Before you stand up a database and use it for production, remember that it’s very difficult to reverse this decision, and this database, however small, needs monitoring and maintenance.