The Mystery Box

We know we need a machine that understands context. But what does that machine look like?

Part 1: The Black Box

The Interface Contract

Before we grab any wrenches, we treat the transformer like a service. Text goes in, probabilities spill out. What does that interface look like from the outside?

Crucially, the model never emits a single word. It produces a distribution over the entire vocabulary, then we decide how to sample or decode it. That means every prompt becomes a ranking of possibilities.

EXTERNAL VIEW

Input

Prompt

"The trophy didn't fit in the suitcase because it was too..."

LLM

Neural Network

Output (Distribution)

Token Probability

Big94%

Heavy4%

Wide1.9%

SmallLogic Error0.09%

PizzaNonsense0.01%

+ 49,995 other tokens (~0%)

"But 'It just works' isn't good enough. Let's turn on the X-Ray."

Part 2: The Machine View

The Internal Architecture

Pop the lid and the single “brain” disappears. In its place is an assembly line: tokenization, embeddings, attention, a feed-forward block, and finally the un-embedding head that turns vectors back into logits.

Each station reshapes the data before sliding it to the next. The X-ray view slows that handoff down so you can read the job description for each component.

X-RAY MODE ACTIVE

Step 1

Tokenization

Step 2

Embeddings

Step 3

Attention

Step 4

Feed-Forward

Step 5

Un-embedding

Tokenization Logic

The model cannot read letters. First, we must chop the text into chunks and convert them into integer IDs.

Data TransformLive Preview

"The"

464

"The Trophy"

OUT

[464, 18231]

Pause

"Now, let's look at the goal. What are we actually trying to achieve with this machine?"

Part 3: Ready to Invent

Invent It Yourself

The rest of the course turns this intuition into a guided mental model. By the end, every part of this transformer—from tokenization to attention—will make intuitive sense, and the supporting code snippets will feel like confirmation instead of mystery.

The Course Roadmap

This course is structured as an adventure.

We will solve a specific problem for each module.

The Alien DictionaryModule 1

Problem: Computers can't read text. How do we turn "uninstagrammable" into numbers without a dictionary of infinite size?

You will invent: A Byte-Pair Encoding (BPE) Tokenizer

Up next: Module 1 — The Alien Dictionary.