6 Comments
User's avatar
gatb27's avatar

First of all thank you for sharing with us your journey with SQL Mesh, it's helping a lot on-top of the official documentation thanks to many examples and comparisons.

I am still struggling about one point when thinking about virtual environment: wouldn't be a risk to have many physical snapshots of the same model ? I am thinking about working with big tables stored on cloud warehouse which requires to spend a lot of time to create a new physical copy of the table for dev.

Is there any smart thing done behind the scene by SQL Mesh to avoid n-plicating storage costs and compute costs to create the snapshot ?

Thanks!

Expand full comment
David Jayatillake's avatar

To clarify, making a virtual environment makes views that point to the prod environment. This is very specific to avoid making many physical snapshots of the same model. Therefore, it would take no time to make the virtual environment and views, and SQLMesh keeps track of these pointers/views for you. This implementation specifically avoids n-plicating storage cost and compute cost to create the snapshot.

It's only once you're happy with the development and want to promote your changes to prod that any data may be run if needed.

Expand full comment
Pankaj Taneja's avatar

Exactly my concern. For me, virtual data environment is more like a git branches working directly on prod data. What is missing for me is a clean physical layer which final snapshots much like the main branch in git repo ( meaning I know which tables are the final tables for my BI developers, data scientist to look for) .

Additionally, too big an ask to explain to everyone in a enterprise where people have evolved from traditional oracle tech stack.

Expand full comment
David Jayatillake's avatar

I'm not sure if you're talking about something like a prod environment or gold-type layer for final snapshots. The prod environment is the one that your data team should use for querying, and it's clear which one this is in SQLMesh. The virtual data environments are spun up instantaneously during development and then removed once dev changes are promoted to prod. No one other than the engineer making the changes needs to query the dev virtual data environment, which will be in a completely separate schema.

If you are thinking about the consumption layer of your data warehouse, this is something you could build with SQLMesh; it would just be your final layer of models.

It might be a bit of a shift for people who are used to Oracle stored procedures, etc, but ultimately, if they want to stay on that stack, then this kind of tool is probably irrelevant for you. It becomes relevant if you're looking to move from Oracle to Snowflake and to modernise how you work, including adopting a framework like dbt/SQLMesh. Honestly, I'm not the most technically capable person, and I managed to learn SQLMesh in less than a month while doing other things. The Tobiko team are really helpful, too, and their Slack is a great place for someone who wants help, AI assisted IDEs also seem perfectly able to make it easy to develop in SQLMesh, too.

Expand full comment
Pankaj Taneja's avatar

Thanks for the detailed response.

I was trying out sqlmesh and the final physical schema looked like table snapshots with fingerprints. That is what confused me. Of it is possible to create physical table with clean names in production, then it could fit my use case.

My tech stack is currently s3, EMR, Redshift and airflow. I am looking to introduce dbt/sqlmesh for simplified workflow. The other folks in my org pretty old school while we have migrated to aws recently. Hence, I want to ensure I understand the production implementation before even i Decide to do a poc on sqlmesh or dbt.

Expand full comment
Sung Won Chung's avatar

Those are tables in the physical layer. Totally understand the confusion there. It will create view pointers that look clean in production. Video explainer: https://www.loom.com/share/216835d64b3a4d56b2e061fa4bd9ee76

We work with your tech stack. If you want to learn more:  https://www.loom.com/share/adbedbe48a6547a5b0837c9393209604?sid=dd2ea8dd-3056-4f6d-926b-ea1a7699280e

Airflow integration: https://www.loom.com/share/64202790139b43afb6e92ddc31174f8b

Expand full comment