The Mystery Box
We know we need a machine that understands context. But what does that machine look like?
The Interface Contract
Before we grab any wrenches, we treat the transformer like a service. Text goes in, probabilities spill out. What does that interface look like from the outside?
Crucially, the model never emits a single word. It produces a distribution over the entire vocabulary, then we decide how to sample or decode it. That means every prompt becomes a ranking of possibilities.
LLM
Neural Network"But 'It just works' isn't good enough. Let's turn on the X-Ray."
The Internal Architecture
Pop the lid and the single “brain” disappears. In its place is an assembly line: tokenization, embeddings, attention, a feed-forward block, and finally the un-embedding head that turns vectors back into logits.
Each station reshapes the data before sliding it to the next. The X-ray view slows that handoff down so you can read the job description for each component.
Tokenization Logic
The model cannot read letters. First, we must chop the text into chunks and convert them into integer IDs.
"Now, let's look at the goal. What are we actually trying to achieve with this machine?"
Invent It Yourself
The rest of the course turns this intuition into a guided mental model. By the end, every part of this transformer—from tokenization to attention—will make intuitive sense, and the supporting code snippets will feel like confirmation instead of mystery.
This course is structured as an adventure.
We will solve a specific problem for each module.
The Alien DictionaryModule 1
Problem: Computers can't read text. How do we turn "uninstagrammable" into numbers without a dictionary of infinite size?
You will invent: A Byte-Pair Encoding (BPE) Tokenizer