Table of Contents
What It Is
TLDR:
- Format: fully remote
- Duration: 3 weeks, from Monday May 4 to Friday May 22
- Live contents: 3 live lectures + 3 live office hours + 3 live group calls + 1-1s with Fabian
- Schedule: lectures on Mondays, office hours on Tuesdays, group calls on Fridays (all at 9 am PST // 5 pm BST), ad hoc 1-1s throughout the week
- Async contents: homework, group discussion
- Time commitment: ~5 hours per week, ~15-20h total
- Prerequisites: basic python and terminal knowledge, but no math or CS background needed
- Group size: ~10 students
- Price: $2,500
Core idea:
The idea is make actual LLM training process the central axis of the course, and teach the conceptual stuff (attention, transformers, RL, etc.) as we progress through it; rather than how it’s currently done, where you listen to lectures on how transformers work or how attention works, and then maybe do the exercises.
Main outcome: you’ll have trained a GPT2-lvl LLM from scratch and acquired an intuitive, first-hand understanding of how these models are built & trained, and what their limitations are + conceptual vocabulary around the topic.
Who It's For
Technical Founders / operators who want to:
- Understand the foundation of LLMs, not just theory
- Get a taste of the core machine learning concepts needed to train a model
- Hack with peers on Andrej Karpathy’s capstone project
- Commit ~5 hrs/week for 3 weeks
What You'll Walk Away With
By the end of the sprint, participants will:
- Deepen understanding of LLM foundations by building one from scratch
- Explain and rerun every stage of a modern LLM training stack: data preparation → tokenization → pretraining → fine-tuning → inference
- Understand the real constraints data, hardware, cost, model parameters and optimization tradeoffs
- Understand how training works in detail
- Make informed decisions about when to fine-tune, use a RAG or both
- Deploy a working chat UI connected to weights you trained
Instructor
From Data Founders interview of Fabian in 2020
Fabian Blaicher-Brown
- Background in Computer Science, Robotics, Information Systems, and Computer Vision Research.
- Co-founder and CTO of AI startup Shipamax (YC W17, acquired by WiseTech Global).
- Most recently Global Head of all Data Science, AI and ML and leading team of 70+ at WiseTech.
Why Nanochat?
We chose nanochat as the artifact to build as:
- Who does not love Andrej Karpathy’s content?
- It provides a complete pipeline to train a small, functional ChatGPT-style model – covering tokenization, pretraining, fine-tuning, and evaluation - on GPUs in the cloud (~$100 in compute)
Key Aspects of Nanochat:
- Purpose: Acts as a "recipe" for building and training your own LLM, offering a hands-on, educational approach to understand the full stack, including architecture and cost.
- Scope: It covers the entire training lifecycle: tokenization, pretraining on data, supervised fine-tuning (SFT), evaluation on benchmarks, and an inference engine with a UI.
- Capabilities: It enables training a GPT-2 style model (around 26 layers) in roughly 3 hours on an 8xH100 GPU node, making it an affordable way to understand LLM training.
- Repo
- Docs
Format
- 3 weeks
- 3 live lectures (1 hour each)
- 3 office hours (1 hour each)
- Cohort size: 10 participants
- Async homework between calls: assignments and voluntarily reading
- WhatsApp chat for Q&A and troubleshooting
Syllabus
Week 1: Getting Started with Nanochat, from data preparation and hardware (~5hrs)
Week 2: The Meat – Let's Train Our Model (~5hrs)
Week 3: Finalizing – Inference (~5hrs)
Example (raw) lecture slides to give you a taste of the material.
Pricing
$2,500
How to Sign Up
Fill in a quick form and we’ll be in touch in ~24h.