Recently, there was some discussion about data products on data-folks.masto.host. This has turned into a lovely data community of its own, with, unintentionally, little overlap with twitterx. Come join us!
What is a data product? A data product is any way that data can be consumed. A data product always needs an interface to allow consumption. It can include one, or use other applications that can act as an interface.
This is why something as raw as a CSV can be a data product. In fact, they may well be the most common data products of all time. How many CSVs are downloaded, emailed or scheduled to be so, across the world every day? A CSV doesn’t have an interface but any computer in the world can open them using many applications. CSVs are also picked up and processed by many services, which are then the consumers.
This is also why database tables, streams, JSON files… can all be data products, too. Database tables (and in fact streams and JSON files) usually have a querying language interface, and are also consumed by services.
dbt projects and models provide refinement to raw data. The further to the right of the DAG, the more refined the product should be.
Micro-services (usually centred around a data store) are also data products. In CRUD, Read is to consume and C_UD are product maintenance actions.
Sometimes database tables are abstracted by ORMs and semantic layers, which provide more refined data products. Rather than dealing with the raw structure of the data, it is abstracted into entities with associated attributes, measures and metrics - allowing consumers to simply request these, rather than having to know how to use the underlying data structures.
ORMs and Semantic Layers can have graphical user interfaces like Django Admin and Cube Workspace. These data products are for technical administrators of these systems, rather than the real intended end user.
The real end user data application for an ORM is often a more generic application. For example, a shopping site enriched with data features, like price comparison and ranking.
The equivalent for a semantic layer is often in a BI tool. This allows users to explore the data in the semantic layer visually, making graphs and dashboards etc.
The BI tool itself is not a data product. No-one ever wanted to consume a BI tool (apart from GCP, Salesforce and Databricks, but that’s infrastructure consuming infrastructure). The data products in the BI tool are the dashboards, graphs and explores. The BI tool is infrastructure to make and host data products, kind of like a taproom which has a bar and a brewery. The brewery here uses known inputs to produce many kinds of beer according to recipes, that caters for most consumer demand.
AI in BI is not a data product, but a very cheap and easy way to produce a custom data product, on demand, by the consumer. Where AI consumes from the semantic layer to produce the data product for the consumer, the taproom analogy above holds. It just has a robot keeping the bar that can serve infinite customers in parallel within a minute.
Most BI tools don’t come with a semantic layer or integrate with them 😭. In this case, the BI tool is like a taproom with a granary attached. If this sounds like you may end up with vermin in a beer vat and occasionally on the bar (metrics and entities defined three ways from Sunday, boardroom data arguments)… I’d agree with you. More refinement steps for the data product are handled in one place by this kind of BI tool, with more risk and less standardisation of product. This product comes with health code violations.
This analogy holds for where AI consumes direct from the data source - except sped up. It's a fully automated grain storing, grain processing, beer brewing and serving establishment. No humans involved, no awareness of vermin in the grain store but able to attempt to produce any beer in the world. That's not Mort Subite, that's rat flavoured IPA - a real sudden death.
Increasingly, semantic layers are being used to power many other kinds of data products than BI tools. HR, FinTech, Travel… you name the industry, at least ten companies are making data products for it. Perhaps this is the fulfilment of the "data is the new oil" promise. Oil is processed into products (plastics, diesel, gasoline, bitumen, kerosene) which are then consumed by specific industries (food & drink, automotive, logistics, construction, aviation).
These products are much more consumable than the raw product - consumers of many types can consume them. Consumers with specialised needs will pay more to have these met.
Many of the apps on your phone are really data products. If you think about your banking app, most of the time you go on there to look at your balance or to check a transaction - this can easily be considered a data product. Not an OLAP data product, but still a data product. It is now common for a banking app to forecast your balance ahead of payday. This kind of feature is much more of an OLAP data product feature.
Bloomberg terminals are one of the earliest examples of refined user-facing data products. Circa 1982, turns out data products have been around for a while.
My favourite data product of all time was a weather application called Dark Sky. Apple bought Dark Sky a few years ago and butchered it into the disaster of Apple Weather. It had many metrics as well as individual data points, so this was a mixed OLAP/OLTP data application.
Reverse ETL is there to ship, monetise, “activate”, “operationalise”… a data product which is often a database table or a semantic layer.
To summarise: what a data product can be is very broad, they can be very raw or very refined. More refined data products provide interfaces and abstraction from the data structure. They can be obvious data products. They can be data products in disguise. They can be completely invisible with production and consumption by services in cyclical chains.
If you are a subscriber, you can message me about this post, or any other, by clicking the button below:
Clear, concise, helpful, I love it. Thank you for this. My fave pull quote below in re the oil analogy:
These products are much more consumable than the raw product - consumers of many types can consume them. Consumers with specialised needs will pay more to have these met.