Brass Tacks

Are LLMs that expensive?

Jun 14, 2025

A few months ago, I bought a split ortholinear mechanical keyboard. I had noticed that my typing position using my Mac keyboard was not ideal, where I kind of twisted my wrist into position. I also ended up doing little twists during typing to hit specific keys. Long story short, my new keyboard keeps my arms and wrists in a good position, and my wrists can stay stationary during typing1 🙌.

I learned during this process that I have relatively short little fingers compared to my ring fingers, which actually dictates needing a different type of keyboard layout, where the columns of keys for my little fingers need to be much further down than my ring and middle fingers. Rule of thumb: if the ends of your little fingers don’t extend past the top knuckle of your ring fingers, then you are probably in the same boat as me2.

These kinds of keyboards are customisable using a firmware called QMK, which has a CLI that you use to manage your keyboard layouts and flash it with new ones. The CLI, over time, has become reasonably complicated, and there is additional complexity from the sheer number of possible keyboards, modifications and chipsets you can have. I could have spent some time figuring out how to adjust my keymap files manually by Googling and asking the QMK community3, but the year isn’t 2021.

I used Windsurf to change or create the files I needed to change my keymap; it did it in one shot with no issues. The repo comes with a bunch of default keymaps, which act as easy examples to learn from for Cascade. However, in order to actually flash your keyboard, you need to both install the CLI successfully, which has a lot of dependencies, and use the correct commands to compile and then flash your keyboard.

I initially tried using the CLI manually, as I’m fairly proficient using shells and CLIs. However, I came across some issues. I probably could have figured them out, but after a few tries and making new Python environments, I gave up and decided to try Warp, which is an AI-first terminal. Warp was able to read the error messages output, mostly due to incorrect or missing dependencies4, and resolve them in a couple of tries. Staying on a roll, I got it to generate the QMK commands to compile and then flash my keyboard and it got those right first time. My chipset is a bit new and undocumented, and has an unusual flashing process, which makes things more interesting for someone new to this kind of thing!

Feeling somewhat happy to have solved this problem, I thought I would document it and share it in one of the subreddits. Oh wow, I wasn’t expecting how hostile people would be to the idea of using AI to solve this problem. People said I could have just read the docs, Googled or asked the community, but I had done this and not succeeded. The main thing they thought about was how much energy5 I would have consumed to use AI to solve this supposedly basic problem. At the time, I thought… sure, pre-training LLMs is expensive, but inference doesn’t use that much energy.

I recently read this post by Juho Snellman that explores the cost of using LLMs vs web search, and it not only confirmed what I thought about inference being cheap, but also that it is cheaper on average than using web search!

Juho’s post has a fun calculator to test out your own specific assumptions. I used Google Search, as I never really use any other web search providers. I used a relatively expensive LLM in Claude 3.7 Sonnet, as this is fairly representative of an LLM I would use for a coding problem. I assumed some level of margin and pre-training amortisation so that it’s fair. I used a very generous amount of tokens - as you can imagine, generating a CLI command would likely only have 20 to 40 output tokens. My keymap files would be more, but very unlikely to be more than 1000.

On this basis, it would have been cheaper to have gone straight to using an LLM and not wasted the energy on the initial web searches I did.

Web search returns results 20x-100x faster than an LLM finishes the query, how could it be more expensive? - Search latency can be improved by parallelizing the problem, while LLM inference is (for now) serial in nature. The task of predicting a single token can be parallelized, but the you can't predict all the output tokens at once.

I thought this was a very interesting point - many people would think that just the wait time for LLM results shows it is using more compute and therefore energy. However, your web search could trigger thousands of parallel subprocesses consuming more energy in total, but running much faster and looking cheaper to the eye.

Maybe we should actually be talking about the energy costs of using “traditional” web search and not LLM inference. It seems as though we now have a cheaper, more energy-efficient6, more effective alternative.

Especially if you think about how many web searches we chain together to find something out. We often don’t know what the best API to find specific information will be when we begin. In the future, LLM systems/agents will have catalogs of well-described APIs to collect context and answer a request. Using an LLM system here to find something out in one shot, rather than multiple web searches, could be one or two orders of magnitude more energy-efficient.

Take that, RSI and carpal tunnel syndrome of future David 👊

The number of times I’ve held my hand up during a Zoom call to compare my fingers to people I’ve been speaking to!

There are a number of communities dedicated to these kinds of keyboards, where people are always crafting their own new keyboard types and showing them off. It’s honestly not that dissimilar to other communities that modify cars, etc. There are multiple subreddits dedicated to this, eg for QMK, for ergo keyboards, for mech keyboards, for selling mech keyboards… they are usually much less belligerent than r/dataengineering.

I’m used to fixing or installing necessary dependencies, but the way the CLI errors conveyed this was kind of weird, and I thought it might have been some mapping thing - where I had the packages, but they weren’t usable without a mapping fix - which can occur with multiple versions of packages when you need to use the correct one in an environment. I might be a bit noob, but it wasn’t straightforward!

“You didn’t need to burn down a forest to solve this!” is an example of the kind of response I got.

Energy costs are a significant component of both training and inference systems, as well as other large-scale technology systems like search. They are proportional to various other costs, such as hardware. More expensive and powerful hardware consumes more energy.

Karthik S

Jun 15

I think all of this “you burnt a tree to configure your keyboard” kind of stuff is just plain old virtue signalling.

We should be celebrating that you got something done quickly that might have otherwise taken a very long time!

Expand full comment

1 reply by David Jayatillake

1 more comment...

davidj.substack

Discussion about this post