The token casino: A practitioner's guide to risk, reward, and AI

A practical guide for everyday AI users to navigate the “token casino”: how to pick the right model, size your prompts, and reduce hallucinations, bugs, and tech debt.

It has been incredible to see the sheer breadth and scope of GenAI tools unfold. Since they first appeared, there have been multiple moments of "surely they won't be able to do that" to "ah, so it can"ranging from useful mathematics to transitioning into new types of languages, including code.

If you're a developer, PM, designer, analyst, or any other type of knowledge worker who uses tools like Copilot or ChatGPT on a daily basis, this probably feels both exciting and slightly unreal. These systems bring real value for you—faster drafts, working code, new ideas—but they also sometimes create bugs, hallucinations, and tech debt you only discover later.

With the persistence of those chasing AGI, and the investment of billions of dollars in its development, the public discourse has become greatly skewed. The way these tools are marketed and discussed often makes them sound like some kind of digital awakening approaching sentience. Framed like this, it is hard not to believe that these technologies are nothing short of magic.

Most people using ChatGPT today, including many of us who are integrating its use at work, do not think twice about how any of these things work. The maths and engineering under the hood are genuinely complex, and concepts like backpropagation or modern reinforcement learning research sit far outside what you need to know to build features, write docs or analyse data. The good news is, you don't need extensive academic rigour to use these tools effectively, but you do need a shared and understood vocabulary for thinking about them. Following Feynman's approach, this post is an attempt to offer exactly that: a practical way to talk about and reason about these systems, so you can make better use of them in your day-to-day work.

Large Language Models are just that: models. What they model is a bit abstract, but you can think of it as an approximation of a brain. With that approximation, an astronomical amount of information is thrown at the brain. The brain is then 'tuned' to respond in the 'right way' to a specific set of uses based on the context and parameters. The quality of these recent models means they are generally more useful across a very broad array of uses.

Models work with tokens, and depending on the type of model being used, its training method, purpose, and the data fed into it, tokens can represent a significant amount. The multitude of tokenisers out there varies greatly in their subtle uniqueness. A token could be whole or parts of words, punctuation, symbols (&) or even parts of images or videos.

Despite the varied forms they can take, their function is generally the same. You supply tokens in the front, and the model can predict for you what the likely next token(s) should be. The cool thing about these things is their variability. You can put way more than one token in, and get a wide range of outputs. You can even tune how the output is going to come out in the model itself!

The vast array of training data and model tuning strategies available means that the models can perform quite a wide range of tasks. The ability to move in and out of languages is one such example, and the languages that these models are adept in cross over from general communication to the domain of programming languages. In effect, this is how one would Vibe Code: taking words and ideas and asking a model to return functioning code.

The casino

The best metaphor I have used to express these concepts and their application is the Token Casino. You walk through the ornate lobby and onto the casino floor, and in front of you are endless rows of slot machines. Unlike most casinos you might come across, this one deals in tokens, not chips or money. You pay with tokens, you get tokens in return. Each model machine will let you try your luck. Sometimes, you can place your token bets and receive something useful in return. However, these models will hallucinate, bringing back something nonsensical, or worse, something seemingly sensical without any shred of accuracy about it.

The payout

Payouts in this casino are derived from converting tokens into something of value. Often, this means something monetary, but education or entertainment would certainly suffice. The nature of so much of our lives being digital these days, from the proliferation of mobile devices in our pockets and the ‘knowledge work' undertaken by so many, means that these tokens almost always have an immediate use.

The gamble is real. Play the slot correctly, and you get back something valuable: time, working code, drafts, and ideas. Play it badly, and you incur a kind of working debt–time spent debugging, rewriting, or validating what the model produces. In software, we refer to this as “tech debt”: shortcuts that seem fast in the moment but ultimately cost more in the long run. AI can create the same effect at scale if you trust its outputs too quickly. As in any casino, anyone can play, but if you want to beat the house/hallucination odds, it is best to know a little bit about how to play

The rules

Pick the right machine

Not all models operate in the same way; they all have their subtle yet unique qualities. Trial and error works well to help choose the right machine for the right job, but keeping an open mind when trying to select the right machine for the right job is a must.

Place your token bet

Unlike your typical slot machine, you can provide different token amounts to the machine and ask it to return a larger or smaller set of tokens in response. In general, a good rule of thumb is that the more tokens, or context, you can provide these models, the higher the probability you will get paid out with something useful and not a hallucination.

Ask the right questions

Whether or not the tokens brought back by the machine are useful is completely up to you. When asking for only a small set of tokens in return, these systems can function like autocorrect or tab completion. At the other end, vibe coders are 'one-shotting' with a small token bet and shooting for the jackpot of functioning software projects of thousands of lines of code. Be mindful and intentional about what you're asking these systems to do, and you can get more payouts.

Gambling isn't for everyone, nor is it necessary to create or be creative. These games of chance and their ability to pay out in digital 'currency' can be very powerful. It is exactly their combination with human creativity and ingenuity that excites me the most.

Enjoy the game!