As I have mentioned before in this substack or on one blue social network or another: I'm an investor in Lightdash.
I had posted recently on LinkedIn about moving away from Looker and towards Lightdash, with interesting discourse (including some dissent). I felt a longer form explanation of my position could be helpful, as I think I wasn’t clear about certain points.
What is Lightdash?
Lightdash is an OSS forever BI tool with a new cloud offering, that I've been beta using/testing for some time. They recently announced their mango seed round! I'm really excited about what this round means for Lightdash and how far they can go with it.
Why did I invest?
You might think that it's a bit biased for me to be promoting this tool which I invested in, but, philosophically, I have good reason to feel unbiased. As I mentioned before, Lightdash is OSS forever. Yes, the reason they have been funded is to make a commercial success of their cloud offering, but no acquirer will be able to make Lightdash closed source. Even if somehow this could happen by other means, the progress Lightdash will have made in the meantime will result in a rich open-source BI tool, fully integrated with dbt, that we can use how we want and fork if forced to.
I invested in Lightdash because, as someone who has used Looker in the last few years, I'm deeply concerned. Even before Looker was acquired by GCP, it had become the fentanyl in the MDS. Once you got on Looker and built in LookML, you couldn't get off. The hundreds of lines of simple LookML you started with gave way to thousands and then tens of thousands of lines of complexity that is indecipherable to anyone in its entirety. At Lyst, where I first encountered Looker, I realised it wasn't actually possible for Lyst to get off Looker. At least, not without a carefully managed rehab program (I left before I could try to implement one):
- Stopping new things being built in LookML
- Stopping new dashboards being created in Looker
- Removing developer credentials
- Pushing all used PDTs into dbt models
- Simplifying use cases served
- Serving all new use cases in another new tool, such as Lightdash
Yes, I am describing Looker as being too powerful. You can do amazingly complex things in it with relative ease. I am also saying that you shouldn't. I'm just as guilty as the next person of abusing liquid and doing things like making views which change grain, based on the selections of the user in the explore. These choices are not sustainable, they don't provide for clarity in your data model, they damage the ability of your org to have a single source of truth, they cause massive and exponential spread of data estate and you end up with an unmanageable mess.
I don't propose Lightdash as an OSS drop-in replacement for Looker - it's not capable of being that... but that's a good thing. Building very complex things in LookML is just like building that crazy spreadsheet with 2m array formulas that crashes most people’s machines. It's generating unmanageable complexity that good Analytics Engineers avoid.
Even if you think this complexity is fine, and that there is nothing wrong with building a lot in LookML, there is another big problem. Yes, your LookML is nicely version controlled and in your company GitHub. It may feel like this means it's yours, but it's not. You need the Looker platform to get any value out of this LookML repo. You are not independent of Looker... ever! Once you've built the monolith, and you can't get clean, you have to pay what the dealer charges. This is an awful position to be in as a data practitioner. I strongly recommend that no team that hasn't used Looker to date should use it now.
With regard to Metaplane, you may say that this is a bit rich given Looker is one of our supported BI tools. However, I would argue the contrary - the fact that Looker use results in this complex monolith is why customers really need their observability tool to look into Looker and connect it with their lineage. To date, we aren't able to provide lineage where views in Looker are based on liquid, which may make it unclear what the source of the view is.
This level of complexity in Looker that, while powerful, is harmful to long-term sustainability of a data stack. Given that a view can change source depending on a user action, it may never be possible for anything to understand full lineage in Looker. In Data and Analytics Engineering, lineage is king. If something makes lineage impossible, remove it. It's that simple. What about our partnership with Looker? There isn't really one to speak of - it's a ghost ship of a company since it was bought. We've never spoken to anyone there: we built our integration by learning their APIs.
At just this point, I think it should be clear to you why I invested in Lightdash. Independence. As a data practitioner, I don't want to be dependent on any one vendor to succeed. It's OK to be dependent on a choice of a few vendors, who you can swap out relatively easily. The competition prevents you from being put in a bad position. Looker being the only self-serve BI tool that isn't NLP-based like Thoughtspot/Veezoo/Metabase, was on a quadrant on its own until Lightdash came out. Lightdash allows us to serve our stakeholders in the way Looker did, but with independence from any vendor, including Lightdash. It's easy to leave Lightdash if you want to, and that's a feature in itself - a non-negotiable one for me.
So, when I invested in Lightdash, sure I'd like to make money some day over the rainbow, but I felt like I was giving to some cause against oppression.
This was why, as soon as Oliver showed me what Lightdash was in principle, I wanted to give to the cause invest. It was far from being a polished BI tool at that point and far even from where it is today.
It gets worse and the reasons become more compelling
GCP's acquisition of Looker has made things even worse. Sure, Looker Account Execs could happily increase your fees each year before this acquisition, knowing that you wouldn't (couldn't) leave. But this adds a few more dimensions. I'll preface this section by saying these are my thoughts based on logic rather than any hard facts, but in my experience, if you follow the money, you find the truth. So you can call this FUD if you like, but like I've said, this is opinion and conjecture. Even so, I'd bet you good money that this will play out.
GCP wants you to spend money on BigQuery, yes to make money from this alone, but also because there is a multiplier. For every $1 spent on BigQuery by a company, on average $6 is spent on GCP as a whole. It's a beachhead for cloud spend for any org. As I have mentioned before, I'm impressed by BigQuery - it is by far the best DWH provided by a cloud vendor. It's the only major DWH giving Snowflake real concern. There is a whole suite of data IaaS, PaaS and SaaS, that play nicely with BigQuery on GCP: BigTable, Dataflow, PubSub, GCS… the list goes on and keeps expanding.
Looker's pre-acquisition revenue was probably around $100m. Let's say that one third of this was from BigQuery customers, as an assumption. Hold that thought. GCP have just announced that LookML is now available as an API for Data Studio, which they have rebranded as Looker Data Studio (if you don't see this as a pretext for it replacing the existing Looker front-end…). Imagine if they decided to rescind access to the LookML API, for all orgs that weren't on BigQuery, losing $67m revenue. Let's say some of the data teams on Snowflake/Redshift or whatever else then moved to BigQuery, to keep being able to use Looker, as after all they're addicted and would rather rob their mother than get off it.
Let's assume 10% of them do. This reduces the loss by $6m, I'm rounding down. However, now they are spending money on BigQuery, too. From my experience, it is, on average, more likely for a team to spend more on their DWH than on their BI tool in an MDS stack. After all, everything then starts to run from there - reverse ETL, CDPs land events there, SWEs who have access to Snowflake or BigQuery often realise their data jobs will reliably run and then start to run them here, as opposed to using something more difficult like Glue or Spark. Let's say that the Looker revenue retained is matched equally by BigQuery spend... so another $6m net saved. Then take into account the multiplier from other workloads ending up in GCP... another $30m net saved.
I would argue the numbers above are pessimistic for GCP and they already show that the loss in Looker revenue isn't exactly what meets the eye. What looked like a $67m loss in revenue, is really more like a $25m loss.
Let's change the assumptions a bit. We'll stick with 10% moving to GCP stack to preserve Looker use, but this time, let’s assume that BigQuery spend is on average 50% more than Looker spend for these orgs. $6m + $9m * 6 = $60m made from losing $67m in revenue. GCP get to do this, while at the same time undermining their competitors in the cloud. Sounds like good business to me. If I were in strategy for GCP, I'd be remiss not to push for this option. Long-term, Looker being unavailable to use with other cloud vendor’s DWHs is bad for them = good for GCP. This is a GCP play at making a complete data stack.
Amongst other possibilities, there are also the savings to be made from gutting Looker and getting rid of their account executives, support staff and front-end engineers. Huge savings, which then change the balance of the equation. Fundamentally, the other thing to consider is that even losing all of the $100m revenue from Looker is pocket change for GCP - they could offer LookML API and Looker Data Studio for free in their stack (free, but BigQuery isn't) without particularly caring about it. Alphabet has a long track record of buying companies and scrapping them for parts or for strategic advantage or just as an experiment gone wrong. They're big enough not to care very much. This would be true even if Looker’s revenue was significantly more.
Do you really want the BI tool you so depend on to be in this equation? Or do you want it to be the life and soul of the org looking after it, their core competency and source of pride? (This is why I would be very happy to use BigQuery and depend on it, look at the release notes… that's true commitment. It's the big show in GCP). Oliver set me up on Sunday morning on Lightdash for the first time (I had expected to be replied to the next day but he's that committed).
As data practitioners, we need both not to be at the mercy of any one cloud vendor, but also to be independent of any one vendor. We need to be encouraged and constrained not to make complex monoliths - we struggle enough not to do it with dbt.
Vote for Pedro Lightdash. If it's not polished enough or doesn't have a feature you need, change it. I've put my money where my mouth is on that one and I'm a very poor SWE, who doesn't actually know Typescript. Even if you don’t feel able to make the change yourself, make an issue and one of us will probably deliver it soon (within reason). It's ours in the way dbt-core is. The data industry has been accused of being very vendor-focused, with the inherent dangers described above. We didn't have the open-source proliferation that SWE has - this is an opportunity for us to head this way as a community and an industry and it makes sense.
I think a looker2dbt tool that converts LookML measures into dbt semantic layer metrics would make cutting our fentanyl habit a lot easier. There's still other stuff you'd have to clean up (all the cool stuff you can do in Liquid but probably don't need to), but it would make a Looker switch seem possible!