Back to sprints

Train an LLM From Scratch

A 3-week sprint to deeply understand LLMs by replicating Karpathy’s Nanochat.

Format
Fully remote
Duration
3 weeks
Time Commitment
~5 hours per week
Breakdown
3 live lectures (1hr each) + 3 office hours + async homework
Schedule
Monday lectures, Friday office hours
Cohort Size
10 technical founders
Next Cohort
Cohort 1: May 4 - May 22
Table of Contents

What It Is

TLDR:

  • Format: fully remote
  • Duration: 3 weeks, from Monday May 4 to Friday May 22
  • Live contents: 3 live lectures + 3 live office hours + 3 live group calls + 1-1s with Fabian
  • Schedule: lectures on Mondays, office hours on Tuesdays, group calls on Fridays (all at 9 am PST // 5 pm BST), ad hoc 1-1s throughout the week
  • Async contents: homework, group discussion
  • Time commitment: ~5 hours per week, ~15-20h total
  • Prerequisites: basic python and terminal knowledge, but no math or CS background needed
  • Group size: ~10 students
  • Price: $2,500

Core idea:

The idea is make actual LLM training process the central axis of the course, and teach the conceptual stuff (attention, transformers, RL, etc.) as we progress through it; rather than how it’s currently done, where you listen to lectures on how transformers work or how attention works, and then maybe do the exercises.

Main outcome: you’ll have trained a GPT2-lvl LLM from scratch and acquired an intuitive, first-hand understanding of how these models are built & trained, and what their limitations are + conceptual vocabulary around the topic.

Who It's For

Technical Founders / operators who want to:

  • Understand the foundation of LLMs, not just theory
  • Get a taste of the core machine learning concepts needed to train a model
  • Hack with peers on Andrej Karpathy’s capstone project
  • Commit ~5 hrs/week for 3 weeks

What You'll Walk Away With

By the end of the sprint, participants will:

  • Deepen understanding of LLM foundations by building one from scratch
  • Explain and rerun every stage of a modern LLM training stack: data preparation → tokenization → pretraining → fine-tuning → inference
  • Understand the real constraints data, hardware, cost, model parameters and optimization tradeoffs
  • Understand how training works in detail
  • Make informed decisions about when to fine-tune, use a RAG or both
  • Deploy a working chat UI connected to weights you trained

Instructor

Fabian Blaicher From Data Founders interview of Fabian in 2020

Fabian Blaicher-Brown

  • Background in Computer Science, Robotics, Information Systems, and Computer Vision Research.
  • Co-founder and CTO of AI startup Shipamax (YC W17, acquired by WiseTech Global).
  • Most recently Global Head of all Data Science, AI and ML and leading team of 70+ at WiseTech.
  • Linkedin

Why Nanochat?

We chose nanochat as the artifact to build as:

  • Who does not love Andrej Karpathy’s content?
  • It provides a complete pipeline to train a small, functional ChatGPT-style model – covering tokenization, pretraining, fine-tuning, and evaluation - on GPUs in the cloud (~$100 in compute)

Key Aspects of Nanochat:

  • Purpose: Acts as a "recipe" for building and training your own LLM, offering a hands-on, educational approach to understand the full stack, including architecture and cost.
  • Scope: It covers the entire training lifecycle: tokenization, pretraining on data, supervised fine-tuning (SFT), evaluation on benchmarks, and an inference engine with a UI.
  • Capabilities: It enables training a GPT-2 style model (around 26 layers) in roughly 3 hours on an 8xH100 GPU node, making it an affordable way to understand LLM training.
  • Repo
  • Docs

Format

  • 3 weeks
  • 3 live lectures (1 hour each)
  • 3 office hours (1 hour each)
  • Cohort size: 10 participants
  • Async homework between calls: assignments and voluntarily reading
  • WhatsApp chat for Q&A and troubleshooting

Syllabus

Week 1: Getting Started with Nanochat, from data preparation and hardware (~5hrs)

Week 2: The Meat – Let's Train Our Model (~5hrs)

Week 3: Finalizing – Inference (~5hrs)

Example (raw) lecture slides to give you a taste of the material.

Pricing

$2,500

How to Sign Up

Fill in a quick form and we’ll be in touch in ~24h.

Apply Now