I’ve been meaning to take a look at Tobiko Data’s SQLMesh for a while. As someone who has been part of the dbt community, I must admit I feel kind of guilty doing so - almost like I’m cheating on them! However, this is not a reasonable way to feel about software. We all should be trying to “sharpen the axe” and improve our toolchain if possible.
I’m most likely going to do a series of short posts on SQLMesh, where I’ll look at one atomic idea and then post again at the cadence I’m learning.
There are a few reasons why I chose to look at this now:
We’re working more intensely on our data stack at Cube - I’ve hired our first dedicated analytics engineer to work on our internal data.
We’ve hit issues where we have inadvertently broken our dbt DAG, because it wasn’t clear in development that anything was wrong. SQLMesh supposedly improves this with some of its inbuilt features, but I’ll look into these in depth in later posts in this series.
Tobiko recently announced backwards compatibility with dbt projects! This makes SQLMesh massively more relevant for my readers than ever before. Rather than having to do a huge migration at once, teams can migrate their project as is and gradually change to SQLMesh format and features over time.
Getting started, the first thing I did was to install SQLMesh locally and use it with our dbt project at Cube. I didn’t expect this to be necessarily smooth sailing, just like when I tried to import my existing Airflow DAG into Dagster.
This is my command line history…
python -m venv .env
source .env/bin/activate
pip install sqlmesh
sqlmesh init -t dbt
sqlmesh -v
pip install sqlmesh
pip install --upgrade pip
pip install sqlmesh
sqlmesh init -t dbt
pip install "sqlmesh[web]"
sqlmesh ui
pip install agate
sqlmesh init -t dbt
open config.py
sqlmesh ui
pip install dbt
What was very confusing was that this installs the dbt Cloud CLI and not dbt-core… it’s been a few years since I’ve had to do this and I’ve often used brew instead.
sqlmesh ui
sqlmesh init -t dbt
sqlmesh init -t dbt --force
This was something Cursor suggested that doesn’t actually work, but it would be nice to recreate the config.py this way without having to delete first. I later realised that this was unnecessary, though.
sqlmesh ui
dbt --version
pip install dbt-core
sqlmesh ui
pip install google.cloud
pip install google-cloud-vision
sqlmesh ui
pip install google-cloud-bigquery
sqlmesh ui
pip install dbt-bigquery
sqlmesh ui
I later also realised that you don’t need to reinitialise the UI if you change or install python packages to the venv…
This all took about an hour of banging my head against a wall, but then it worked! So all in all not too bad. It mostly seems like annoying Python package stuff and not really SQLMesh’s fault.
The DAG ran. So, proof you can port your dbt project if you want to. I am terrible at devops and environment stuff, so if I can do it - you can do it.
I tried changing a model to be in native SQLMesh format to see if a mixed project works, and it does. This proves that you can do gradual migrations, which is essential with larger scale projects.
Without this strong backwards compatibility with dbt projects, I can’t see how SQLMesh can really make strides in the market. Too many teams have invested heavily in their dbt projects and learning how to use it. If you told them all that they had to learn to use something else and do a big migration, they would almost all say no.
With the backwards compatibility, allowing for gradual migration, trying out using SQLMesh, with an existing dbt project, is easily something an engineer could do on an afternoon or during an “innovation sprint”1.
From here on in this series, I’m going to start with a fresh project, so I can experience SQLMesh’s features cleanly. My intention is to use DuckDB on Bluesky data, so that anyone can follow along with the exact data. I’ll also use a public repo.
I’m not sure these exist in the post-ZIRP era.
So Toby Mao pinged me and said I could have banged my head against the wall much less if I had run this command:
pip install "sqlmesh[web,dbt,bigquery]"