Lesson Plan: A Brief History of Coding Agents

Module: Coding Agents
Segment: Historical Foundations
Duration: ~10 minutes
Format: Lecture (no exercise)


Learning Goals

By the end of this segment, students should be able to:


1. The Statistical Origins: Code as a Corpus (2 min)

Key idea: Code is text, and text has statistical regularities — so models trained on code can exploit those regularities to predict what comes next.

Naturalness of Code (Hindle et al., 2012)

Early Autocompletion Systems


2. The Transformer Arrives: Code Generation at the Function Level (3 min)

Key idea: The transformer's attention mechanism can, in principle, relate any token to any other token in its context window — collapsing the long-range limitation of n-grams.

GPT-2 and the First Hints (2019)

CodeBERT (Feng et al., 2020)

Codex and the Function-from-Header Paradigm (Chen et al., 2021)

Why the Function-Header Task Was the Right Initial Benchmark


3. Expanding the Scope: From Autocomplete to Agent (3 min)

Key idea: Once single-function synthesis was demonstrated, researchers and practitioners immediately asked — what happens when we scale the task?

AlphaCode (Li et al., DeepMind, 2022)

InstructGPT and the Role of RLHF (2022)

The Shift Toward Agency: Repair Loops and Tool Use


4. Where We Are Now (2 min)

Key idea: Modern coding agents are not just better code completers — they are autonomous problem-solving systems that can read files, run commands, browse documentation, and iterate.

The Benchmarking Evolution

BenchmarkYearTaskDifficulty proxy
HumanEval2021Single function from docstring~20–50 lines
MBPP2021Simple Python programming problems~5–15 lines
DS-10002022Data science tasks (NumPy, Pandas, etc.)Domain-specific APIs
SWE-bench2023Fix real GitHub issues in real reposMulti-file, multi-step
SWE-bench Verified2024Curated subset with human-verified solutionsMulti-file, multi-step

What "Coding Agent" Means Today

A modern coding agent is a system that:

  1. Receives a task specification in natural language

  2. Uses tools: file read/write, terminal/shell, test runner, web search, documentation lookup

  3. Iterates: plans, acts, observes output, revises

  4. Manages context: decides what to read, what to keep in the window, what to ignore

This is exactly the ReAct loop you studied in the reasoning module, applied to software engineering. The history we traced today is the story of how the underlying model capability grew to the point where this loop became useful rather than frustrating.


Key Takeaways


Citations