I’ve seen a few articles out there now like this one, which talks about how the writer is fed up of AI-generated content and would rather see the prompt that was used to generate it. Where a user simply relys on ChatGPT or some other LLM and asks it to do something, I can totally understand this perspective.
Imagine if I wrote a prompt like this: “Make me a 1500 word whitepaper that explains the benefits of using a semantic layer, when allowing natural language querying with corporate analytical data. Talk about how, without the semantic layer, the meaning of the data is mostly lost and derived by the LLM even when provided with a knowledge graph and other documentation, and that it needs an easy interface to simply request things in as close to linguistic style as possible, whilst previously being given the most relevant things to select from.”
The LLM will probably just embellish the prompt and maybe define some of the terms in the prompt directly or indirectly in its output. It would be much easier to see the prompt and understand what the user was trying to say in the first place. Sometimes I wish I could see this intent behind any post I am trying to read, and skip all the preamble that I mostly don’t need to hear. I often find myself using AI to simply summarise all the articles that seem interesting upon reading the title, and then only read the full article when the AI summary is so full of interesting things that I get curious.
Seeing the prompt is like seeing the deeper thinking of the human, as I described -seeing their intent. Humans often don’t show their intent; there are some cultures where it is downright frowned upon to be so direct. With it, you can see the bias that the human wanted to subtly implant into the longer wording they actually use. You see the things they wanted to say without saying them directly.
Nonetheless, I understand why people would rather save their time and see the prompt. However, this is only true when they solely use an LLM to generate their content. This is when it can end up very generic and boring. Great for necessary evils like marketing copy, but not ideal for engaging with others or sharing something interesting.
Just like RAG is important to make AI systems more powerful, it’s also important for making AI-generated content more relevant. Whether that be in content, style, format or otherwise. NotebookLM is a great example of a tool that allows you to do this and actively discourages you from just trying to rely on the information pre-trained into an LLM. It requires you to choose sources. Some use a huge body of sources every time. I have found greater precision and success by carefully curating the sources I use each time. Sometimes I use my own blog posts1, sometimes I use the search function to find other relevant content I want to include, sometimes I paste PDFs of things I want to emulate in style or structure.
The output is night and day vs just using an LLM and a prompt to generate your content. You can tell it how to use each source in the prompt, too. You can tell it to use the style of the writer in one of the sources. Your output now is pretty tailored to what you want to do. You can tell it to be biased towards a certain perspective.
From a professional point of view, writing content using this method is superior to writing your own content from scratch. Professional content doesn’t need your personal touch, and if it does, you can refer to your own previous content. You can build upon content you have already made this way, kind of like importing a software package. Remember, most professional content is there to simply get someone to hand over their email or to improve SEO; it’s not really for a person to read, it’s for a machine to find most similar to other content or find most similar to a request.2
There’s skin in the game. This last cause is least common among individuals, but probably accounts for the overwhelming majority of language pollution on the Internet. Examples of skin-in-the-game writing include astroturfing, customer service chatbots, and the rambling prologues found in online baking recipes. This writing is never meant to be read by a human and does not carry any authorial intent at all. For this essay, I’m primarily interested in the motivations for private individuals, so I’ll avoid discussing this much; however, I have included it for sake of completeness.
There is no way that anyone could simply read your prompt and then understand the content generated using the RAG method, which I described above. They would need to read all your source material with the context of how you instructed the system to use it in generating your content; it would be easier to read your content directly. I have written professional content this way, and I’ve written it from scratch. This way is 10x faster, and I can’t see a discernible quality difference. It doesn’t need editing in the same way as when a person writes it, as it doesn’t make typos or grammatical errors. It feels like the ideal marketer is now an engineer or product manager who knows the product and the industry inside out, such that they can curate the best sources and ask for exactly what they want.3
Clayton Ramsey, who wrote the post above, explains why we write - but on a personal basis, whether academically or otherwise:
I believe that the main reason a human should write is to communicate original thoughts. To be clear, I don’t believe that these thoughts need to be special or academic. Your vacation, your dog, and your favorite color are all fair game. However, these thoughts should be yours: there’s no point in wasting ink to communicate someone else’s thoughts.
In that sense, using a language model to write is worse than plagiarism. When copying another person’s words, one doesn’t communicate their own original thoughts, but at least they are communicating a human’s thoughts. A language model, by construction, has no original thoughts of its own; publishing its output is a pointless exercise.
I agree with this for personal content, like this post. I’m trying to share my own thoughts with you. There is no point at all in using an LLM to generate this content. It would defeat the object of writing it in the first place.
However, if I’m making content that is really to score leads or for a machine to read… well, a machine may as well write it under my curation. This kind of content genuinely isn’t to communicate my thoughts and actually is supposed to achieve the aims of someone or something else. The curation of sources, and my direction about how the post should sound or be structured, what tone it should have, what topic it should have, what bias it should have… it starts to resemble writing a plan before writing an essay and finding your relevant sources. Except that instead of this process taking hours or days, it takes minutes, and your plan and sources, if you are an industry expert, are enough.
Of course, you’ll still want to check the output and edit a sentence here and there, tell it to use fewer bullets, tell it to use more bullets, remove anything factually incorrect, get rid of yucky phrases… but for a 1000ish word piece of content, this can take 10 minutes. You need to know your industry and product inside and out; you need to be opinionated; you need to have a clear understanding of your thoughts. However, if you meet these criteria, you can generate excellent content with AI.
There are many jobs that generate this kind of content in one form or another. These will be the first to be disrupted by AI in the coming years. It’s so capable, today, I can’t see myself hiring someone to do this work ever again. If the design part could be automated to include the correct headers, branding, and images, then this kind of content could be generated and published by one person instead of a team.
Everything in this substack is written without AI unless I’m quoting an LLM, as you have seen in the past. I do find it helpful that I now have a large body of original content in my voice with my thoughts to lean on when I use AI to write professional content on similar topics. It really does feel like importing some Python package I made earlier.
It could actually be really helpful to know how similar a piece of content is to another in real time. Imagine if you were writing some content intended to address some points that a competitor had made. Ideally, you would want your new content to be very similar to that content. Vector similarity will become the new SEO.
Better yet, one who writes their own content from scratch in their own voice and style every week with a back catalog of about 200 posts = 300000 words = 600000 tokens. 😁