6 Comments

Really interesting article!

One thing though: It would be great if you could include the names of the tools/companies you've included not just the logos. Or maybe put a link on the respective logo.

Most of them are big/common enough to know but there were a few I had to look up via Google's reverse image search. :D

Expand full comment
author

Thanks Matthias! I think because of the way I pasted a load of images together elsewhere so they can be in a row (substack doesn't allow this with individual images), that it's not possible to have the links work. However, here are names/links of the companies in order:

-Airbyte

-Rivery

-Fivetran

-gravitydata.co

-BigQuery

-Firebolt

-Snowflake

-Rockset

-Clickhouse

-Metaplane.dev

-Monte Carlo

-BigEye

-Continual.ai

-DataRobot

-Lightdash.com

-Looker

-Glean.io

-hex.tech

-deepnote.com

-hyperquery.ai

-popsql.com

-Mode

Expand full comment

Interesting article.

I think what was apparent is how you see the core position of DBT cloud (proprietary SaaS product) as a centre of gravity (Metadata API, etc.). This is something that is not necessary - as long as the format is standardized, any tool could host a metadata layer, or it could be directly stored in the orchestration engine as artifacts (so DBT run can be operated by any agent(s))

Furthermore a unified metadata serving layer could be a job job of data governance/(meta)data exploration tools, such as Marquez, Amundzen etc.

Expand full comment
author

I agree it is possible for another tool to host the metadata but there is already a big part of it from dbt manifests, docs/schema/sources yml etc that lives within the dbt space anyhow. In a way dbt has also become a place to define metadata as code. Therefore, it's hard for another tool to as easily define these elements but could easily provide a nice UX to consume... In a way some of the discoverability tools are already doing this by consuming from the metadata API and enriching with things liked query history parsing.

Expand full comment
Feb 1, 2022Liked by David Jayatillake

each DBT run agent generates metadata and artifacts. those can be stored anywhere, not only in DBT cloud - that was my point.

So what am I saying is that any orchestration tool with enough support of standardized artifacts+metadata ingest can match dbt cloud offering plus it has a view on the whole data pipeline logging and metadata, not only the transformation layer. Additionally, it can feed

The article points everything towards the DBT cloud as a centre of gravity in which each tool should plug, I would argue DBT framework and metadata+artifact standardization is the common glue for all tools, not DBT cloud (which I still see as a nice way to execute DBT runs and nice UI over artifacts and metadata)

Expand full comment
author
Feb 1, 2022·edited Feb 1, 2022Author

Ah I see what you mean, I agree with your last sentence that the dbt framework and metadata+artifact standardisation is the common glue for all tools, not dbt Cloud. Exactly, dbt cloud should make using dbt really nice and that's why you pay for it... it shouldn't be necessary to use the framework. I mention the metadata API above a lot because I don't know of another one to use. You're right another vendor could make a pure metadata API which consumes manifests and artifacts plus metadata from other tools too. Again I would say that some of the discoverability/observability tools on the market are doing this.

I guess it feels more convenient for it to be in dbt cloud than another tool, given I want the metadata ingested via the API to link to dbt entities. If I'm already paying for dbt cloud in all likelihood as a dbt user, out of choice because it's nice and saves me some time and hassle, then why would I want to pay for another separate tool with an integration to go wrong and lag behind new dbt features. Any other tool would probably not be free as it has to store state and provide a stable performant API.

Expand full comment