5 Comments
User's avatar
Dan Goldin's avatar

Great post. I view it as largely inevitable that more and more of the data stack gets absorbed into the DW. Look at Snowflake now - they announced a Fivetran alternative, dbt support, etc.

Will it be as good as specialized tools? No. But will it be good enough to capture majority of customers who don't want to deal with multiple tools? Probably!

Still a market for other companies but a smaller share than before (although maybe the market is growing?)

Expand full comment
David Jayatillake's avatar

I mean back before big data we had four monolithic stacks that had everything. I think DBRX, ❄️, GCP and Microsoft are now at that point again.

Expand full comment
Dan Goldin's avatar

What's old is new again. History (+ data history) is cyclical

Expand full comment
Peter Andrew Nolan's avatar

Hi Dan,

"Will it be as good as specialized tools? "

I have published my SeETL software for free now. It creates SQL as the ETL language. It is ansi-SQL so it runs pretty much on anything. That means that anyone who wants an ETL tool that can map 6-8K fields per work month in development that produces ansi-sql can have one for free. Including all the training that goes with it.

SeETL has a lot of free utilities to get data from files into the landing area as well. Simply put? If a company wants a "cheap and cheerful" ETL tool they can have mine for free.

And the funniest part of all is that the user interface is Excel. That's how we get the productivity of 6-8K fields mapped per work month.

I have been in this business 34 years. No one can map 6-8K fields per work month. Our internal use not free version is at 12-15K fields mapped per work month. I even had a (very long) day where I mapped 1K fields in one day. The standard was 1K fields per 200 hour work month from 1997 to 2017. So mapping 1K fields in one day most people consider unbelievable.

In fact I did a demo for the head of AT&Ts data warehouse integration project back in 2009. He called me a liar and a fraud and told me the demo must some how be faked because what I showed him is not possible. This was despite the InfoSys account manager telling him that his own people had told him this was real! LOL!

The bottom line is this. ETL is now written in ansi-sql. (Though we still use our old C++ programs for dimension tables). You can write that SQL by hand or you can generate it some how. And we are at the level of 12-15K fields mapped per 200 hour work month.

For free anyone can map 6-8K fields per work month. So why not do that? Sure, big companies want to buy their ETL from big vendors. Informatica just sold for 8B and it does not do that much more than what we do. LOL!

I am waiting for the penny to drop. All these people who have been mapping 500-1000 fields per work month in their "data pipelines" because they want to use a GUI to define the mapping to generate the code rather than just use Excel? It's actually pretty funny when you think about it.

Expand full comment
Peter Andrew Nolan's avatar

Hi David,

Indeed, I have long thought that we will have nearly "headless" BI. Only problem was, no one believed me or took me seriously about that. People have asked for "more dashboards" for the last 30 years. In the mid 90s I predicted that "more dashboards" would not end well. I said people will develop dashboards merely because they think they want them rather than answer the questions that need to be answered to make more money. As per usual, I was right.

I have been in BI 34 years now and I have still not seen a product that is able to make more money for commercial companies than Ralph Kimballs Data Interpretation System by Metaphor. Ralph was a co-founder there.

My prediction is that our whole BI industry will eventually admit our mistakes and go back to using what is now called Meta5. With Meta5 you can now run it as a "black box". So what you can do is have "agents" of some kind searching through large databases looking for the conditions that it has been programmed to report on or thinks should be reported on. It can then send the data that is needed to create the report to a table and that data can be read by Meta5, the data sent to Excel, and the report sent to One drive or email.

So what does this mean? It means the most complex analysis that can possibly be written in any product that accesses data and sends the results back to a table can be written. And then Meta5 simply comes along as a "black box", picks up that data, and sends it to whoever should know about it.

Meta5 can also allow business people to directly query any data they have access to. No need to get in the IT Q. This was the number one feature of Meta5 in 1991 when I first used it. The business people did not have to ask the IT people for any help to answer their own questions. Every company I talked to who used Meta5 told me the number one feature of the product was cutting IT out of the process of answering a question. So you can see why I thought "more dashboards" was not such a great idea.

Now, there has been one more thing invented. And I think this is a game changer. An old pal of mine invented the idea of "pseudo dimensional models" over large operational systems. So take Business Central as an example. The link is below. It is possible to put views over a large operational system to present the data as a pseudo dimensional model.

Sure, that's not perfect. But it is cheap. So any large operational system, or a direct replica, can have pseudo dimensional models placed over the top and then hand coded, or tool coded, or LLM coded SQL can read it with a low probability of incorrect data being returned. Not zero. But low.

My own opinion is that this idea of putting pseudo dimensional models over the top of large operational systems, or replicas, is a MUCH bigger market than the DWH marketplace because smaller companies can afford this because it's pretty much free.

The other thing that has been invented is this. With Meta5, and only Meta5 as far as I know, you can store SQL in the cloud. You can read it to the Meta5 desktop/server, you can perform parameter substitution at run time, you can then send the SQL to the data sources to collect the data. And then you can send the results to the user. For large operational systems this means you have one place you maintain all the SQL code and that SQL code can be used by hundreds, if not thousands of companies who use that large operational system.

Take for example Business Central used in retail. And take for example a region managers suite of reports he wants each morning. And consider there may be 10 companies with 20 region managers each in them for 200 region managers who all want their reports at 9am. With Meta5 you could have ONE VERSION OF THE SQL for those reports in the cloud. Meta5 could be installed on prem for each customer. It would read the ONE VERSION of SQL, edit it for each company for each region manager, send it to the right BC, get the answers, and send each individual spreadsheet to each region managers one drive. And in all that? The only piece of software that is not already owned by the companies is Meta5. And that is cheap now too. Of course "direct query" is not perfect and it's better to have a data warehouse. But did I mention it was cheap? LOL!

Here is an example of such views.

https://bida.ro/2024/06/23/bida0050-dimensional-models-over-business-central/

Expand full comment