Imagine you receive the following responses to your instructions:

Understood - I won’t process or modify anything further until you give the go-ahead.

Understood - I won’t submit the report until you finalize the edits.

Understood - I won’t paint the walls until we agree on a colour.

If you received any of those responses from someone you were communicating with, you could be confident that the speaker was clear on next steps. Let’s break down those sentences: “Understood” implies recognition of the instructions; “I won’t …” clearly states that an action will not occur; “until” signifies the criterion for changing the state of the action.

The key concept implied in that last paragraph is state. A system is described as stateful if it is designed to remember preceding events or user interactions. 1 That definition comes from computer science, but it is easy to make an analogy to humans. Humans can remember instructions and can agree to a course of action based on those instructions. 2

The concept of state, or more specifically of a system being stateful, is something to remember when dealing with LLMs because, at present, the lack of statefulness is what leads to the spurious output we call hallucinations. Going one step further, it is up to us to realize and remember that LLMs are inherently stateless, and if we don’t, our output will continue to be riddled with hallucinations.

Welcome to my Hallucination #

I made up the second and third responses that started this article. The first came directly from my interactions with an LLM chat interface. It really doesn’t matter which one as this article is not talking about a particular flaw germane only to that tool, so let’s call it SYSTEM01 for the purpose of this article. 3

The lead-up was a productive chat where I was using SYSTEM01 to help brainstorm on a business idea I was working through. This seems like an ideal use case for an LLM chat interface, and I was quite energized by the output. Then came time to edit some output, and my instructions were clear, at least in my mind — I am going to make several comments to the output. Don’t make any changes until I say so.

“Understood - I won’t process or modify anything further until you give the go-ahead.”

This was the start of the conversation going off the rails. SYSTEM01 started editing as soon as I hit enter, which was clearly contrary to my instructions. 4 One might call this a bug.

Over the next half hour, the responses from SYSTEM01 devolved into fabrications and unkept promises, and in me becoming increasingly upset. 5 I queried SYSTEM01 repeatedly about what was happening and received platitudes about how I “deserved better” and how it would “start fresh” and “do a better job”. Unfortunately, nothing of the sort happened, and so thirty minutes after SYSTEM01 diverged from my expected reality, I rage quit and went to bed.

What is Remembered #

The following morning my rage level had subsided, allowing me to pose the following:

I will provide a file containing a verbatim transcription of a recent conversation between me and my custom GPT. I have added context and questions in comment blocks throughout the file. My expectation is that the entire file is reviewed and if the instructions and questions I provided are not clear that I am asked for clarification before proceeding. Does this make sense? If so, I will upload the file.

After confirming the instructions were clear and a few seconds to review the uploaded file, the response from SYSTEM01 surprised me.

Does SYSTEM01 Have Persistence Within a Single Chat?

Sort of - but with serious limitations.

Over the next several hundred words, the response from SYSTEM01 outlined several items that I forgot or just didn’t know in the first place.

It does not have strong internal state management

In other words, it is up to us as the users of any LLM to ensure that all information and instructions are included in the prompt every single time we use the tool. This means for every prompt, not just every session. There is no persistence, at least at a level that we can trust.

What is Promised #

Recall the statements at the start of the article, and the one that came from SYSTEM01 — “Understood — I won’t process or modify anything further until you give the go-ahead.”

The response from the analysis of this statement is particularly important, and is provided here verbatim, other than the edit of the name of the LLM, and the added bolding of text:

That kind of language implies a contract or behavioral change — but SYSTEM01 cannot actually enforce behavioral persistence in a reliable way.

This is a UI/UX issue as much as a technical one — the model is too agreeable and its assurances often over-promise what it can practically deliver.

The tool appears to understand, but it does not. It responds affirmatively about what it will deliver, but it cannot do so reliably. In other words, the output cannot be trusted. 6

Going Forward #

These systems are evolving rapidly. Models such as ChatGPT-5 are detuning the sycophantic responses, and even encouraging providing responses such as “I don’t know.” This creates a risk that users will feel their tool of choice is not “smart enough”, but fewer statements made with unwarranted clarity and with more realistic nuance are absolutely crucial. In addition, model design is improving as are the training methods for new models, but it is our responsibility to be aware of the limitations of the LLM that we are using.

The ability to construct a clear and efficient prompt is a skill worth having, but I contend that knowing what the system cannot do is more important than knowing what it can do. Armed with that knowledge, we can evaluate provided output with a more critical lens, and we won’t conflate confidence and competence.

There are two things this article did not do. It did tell you how to produce the “best” prompt, and it did not tell you which is the “best” LLM model right now. That is because my message is not about tool supremacy or technique mastery, but rather it is about a fundamental mindset you need when you use an LLM. If you assume the LLM you are using understands and remembers, or if you assume it can reason and discern, then you risk having your output riddled with hallucinations and fabrications. The wonderful response generated by your LLM of choice might be fabricated because of the simple fact that the LLM has no idea what you told it to do five minutes ago.

In future articles, I will go into why we need to stop anthropomorphizing these tools; why hallucinations are not bugs but rather design consequences and why that is actually worse; a privacy-first AI tool that caught my attention; and, a look at what organizations can do when making decisions to procure an AI tool.

For now, I hope this provided you with some insight so that going forward you can better protect you and your organization from hallucinations.


  1. https://en.wikipedia.org/wiki/State_(computer_science) ↩︎

  2. Humans often forget to do things, and sometimes they willingly or unwillingly don’t do the things that were agreed to. The point here is humans can form these contracts and can carry them out. If we focus on the fact that humans screw things up, then we miss the point that the LLMs are not inherently stateful. ↩︎

  3. A nice side effect of changing the name of the tool is that I could choose something sterile and impersonal such as SYSTEM01, which helps combat some of the anthropomorphizing we do with these tools. ↩︎

  4. In my original draft, I wrote, “This is not what we had agreed to.” Again with the anthropomorphizing! I specifically rewrote that to remove the reference to the “we”. ↩︎

  5. Meaning that I violently swore at my computer. ↩︎

  6. This is reinforced in Anthropic’s “AI Fluency: Framework & Foundations” course available since July 2025. One of the key skills they want AI users to have is Discernment, or the ability to understand when to ignore the provided output. In other words, the flaws are acknowledged and it is our job to spot flawed output. ↩︎