CS 6501 Workshop on Building AI Agents

Prof. Henry Kautz, henry.kautz@virginia.edu Spring 2026, Tuesdays & Thursdays 2:00pm - 3:15pm Mechanical Engineering Building Room 339

TA: Wenqian Ye, pvc7hs@virginia.edu

Office Hours

In-person office hours: Tuesdays & Thursdays 3:30-4:30pm, Rice 511. Sign up required: if no one is signed up by the start of office hours, I will not be in my office.

Zoom office hours: Wednesdays 12:30pm-2:00pm. Sign up required.

Questions about course enrollment, logistics, or absences: email both the instructor and TA.

Emergency Remote Class

Class is normally in-person on. It will be held by Zoom only when in-person meeting is impossible due to weather or other emergencies. In such cases, join by this link with passcode 152379. Students are required attend the session live.agent-types-nvidia

Illustration from Small Language Models are the Future of Agentic AI by Belcak et al. 2025

Class Calendar

Class Roster

Syllabus

In this hands-on workshop, we will learn how to build AI agents: systems powered by large-language models that autonomously interact with services, tools, and other agents.

Much of the programming work will be completed during class. Due to its nature as a workshop, class attendance is mandatory. Attendance will be take on paper at each class. Attendance can only be excused for illness or career events (including athletics): in such cases, notify both the instructor and TA. Any student who misses more than two classes without valid reasons will receive 0 for class participation for the semester. Chronic absences will result in failing the class. Students are responsible for signing up for office hours for their mid-term portfolio review and discussion of final project ideas on the dates noted in the class calendar.

You will need to create accounts on the following platforms. Please create your accounts before the first day of class.

I have found that Visual Studio Code with the Claude Code plugin to be a superb IDE.

For most of our tutorial-type programming we will use small open-source models and run them locally. The free tier of Google Colab includes use of T4 GPUs so you can run models more efficiently than on laptops. The maximum session runtime is 12 hours with no more than 90 minutes of inactivity. Heavy usage can trigger "resource exhausted" messages and/or throttle your jobs. CoLab Pro enables up to 24 hour sessions and provides better GPU availability and RAM.

For later projects, you might choose to use API access to a state of the art model from OpenAI, Anthropic, or Google running on their own servers. Although you can buy time for any of these models all from a single aggregation service (including AWS Bedrock, Google Vertex AI, and Microsoft Foundry), it is not possible to put a hard limit on your potential charges. This can be dangerous for your credit card if your code runs wild! I recommend instead signing up for the Claude API directly from Anthropic because you can set a credit card charge limit. The SOTA model Claude Opus 4.5 costs $5 per million tokens processed, and Claude Haiku 4.5 is nearly as good and costs only $0.45 per million tokens.

Students comfortable using the department research GPU cluster may optionally use them, but be sure to use SLURM and be careful not to bog down the few GPU servers for more than a few minutes.

In addition to programming, we will read about one paper a week and discuss it in class.

Topics

Grading

Your work from each class and your final project should be stored in a GitHub repository. The final project, including a polished 10-minute video presentation, is due by 12 noon on Thursday, April 16. This is a hard deadline and extensions will not be granted.

Final Project Ideas

Here are some ideas for final projects just to get you started thinking. Please talk to me via email or office hours about your choice of project before spring break. You may work alone or in a team of two students (not more). You will need to build a working system, give a presentation about it, and write a report that describes the problem it solves, the design of the system, and the results of running the agent.

Resources

User Guides, Websites, and Blogs

Unsloth, an open-source framework for LLM fine-tuning and reinforcement learning. https://unsloth.ai/

Tinker: a training API for researchers and developers. https://tinker-docs.thinkingmachines.ai/.

Ouyang, Long, et al., Advances in Neural Information Processing Systems (NeurIPS) 2022. Training language models to follow instructions with human feedback. https://arxiv.org/abs/2203.02155

The Model Context Protocol (MCP) Course, sponsored by Hugging Face and Anthropic. https://huggingface.co/learn/mcp-course/en/unit0/introduction

Domain-specific small language models, 2026, Guglielmo Iozzia, Manning Publications. Contact instructor for a pre-print. Textbook with code for many of the concepts in this course. Source code here: https://www.manning.com/books/domain-specific-small-language-models

Olmo 3: Charting a path through the model flow to lead open-source AI. AI2. https://allenai.org/blog/olmo3

Weaviate Claude Skills: A comprehensive set of Claude Skills for working with local Weaviate vector databases. These skills enable you to connect, manage, ingest data, and query Weaviate running in Docker directly through Claude.ai or Claude Desktop. https://github.com/saskinosie/weaviate-claude-skills

OlmoEarth Platform: Powerful open infrastructure for planetary insights. https://allenai.org/blog/olmoearth

OpenAI ChatGPT Atlas https://openai.com/index/introducing-chatgpt-atlas/

Agentic browser: an open-source, privacy-first alternative to ChatGPT Atlas, Perplexity Comet, Dia. Forked from chromium and 100% Opensource. https://github.com/browseros-ai/BrowserOS

IBM Granite 4.0 Nano-Models https://www.ibm.com/granite/docs/models/granite

Ai2 AI for Science Home Page: CodeScientist, DiscoveryWorld, DiscoveryBench, olmOCR, Ai2 ScholarQA, scientifc datasets S2ORC & S2AG. https://allenai.org/ai-for-science

Gemini for Google Workspace Prompting Guide 101. https://workspace.google.com/learning/content/gemini-prompt-guide

GPT-5 Prompting Guide. https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide

Claude Prompt Engineering Overview. https://docs.claude.com/en/docs/build-with-claude/prompt-engineering/overview

Effective context engineering for AI Agents. Anthropic Blog. https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

"Building LLM applications for production", Chip Huyen's Blog, 2023. https://huyenchip.com/2023/04/11/llm-engineering.html

LEANN RAG Vector Database https://github.com/yichuan-w/LEANN

Fine-tune a pretrained model, Hugging Face Documentation. https://huggingface.co/docs/transformers/training

Large Language Model, Stanford Course, by Percy Liang. https://stanford-cs324.github.io/winter2022/

Agent Design Patterns: A Hands-on Guide to Building Intelligent Systems. Antonio Gulli. Preview of e-book. 424 pages. https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/preview

Introducing Gauss, an agent for auto formalization, 2025, MathAI.inc. https://www.math.inc/gauss

Summary and Bibliography of Lean Mathematical Breakthroughs, Jan 2025-Jan 2026. Kevin Sullivan.

Papers

Masterman, Tula, Sandi Besen, Mason Sawtell, and Alex Chao. "The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey." arXiv preprint arXiv:2404.11584 (2024). https://arxiv.org/abs/2404.11584.

Training AI Co-Scientists Using Rubric Rewards. Shashwat Goel, Rishi Hazra, Dulhan Jayalath, Timon Willi, Parag Jain, William F. Shen, Ilias Leontiadis, Francesco Barbieri, Yoram Bachrach, Jonas Geiping, Chenxi Whitehouse. https://arxiv.org/abs/2512.23707

QLoRA: Efficient Finetuning of Quantized LLMs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023. https://arxiv.org/abs/2305.14314

A generative model of memory construction and consolidation, Eleanor Spens & Neil Burges, 2023. https://www.nature.com/articles/s41562-023-01799-z.pdf

Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. Microsoft Research AI Frontiers. 2024. https://arxiv.org/html/2411.04468v1

Generative Agents: Interactive Simulacra of Human Behavior. Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein, 2023 https://arxiv.org/abs/2304.03442

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei, J., et al. (2022). https://arXiv:2201.11903.

Internet-Augmented Dialogue Generation, Komeili et al., 2021. from Meta AI Research. This was one of the first papers to systematically explore augmenting conversational AI with real-time web search. https://arxiv.org/abs/2107.07566

Toolformer: Language Models Can Teach Themselves to Use Tools. Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom, 2023. https://arxiv.org/abs/2302.04761

Language Models are Few-Shot Learners, Brown et al., 2020. https://arxiv.org/abs/2005.14165

A Survey on In-Context Learning, Dong, Q. et al. (2024). https://arxiv.org/abs/2301.00234

Formalizes ICL, relates it to meta-learning and prompting, and surveys techniques, analyses, and applications specifically for LLMs.

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning, 2024, Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion. https://arxiv.org/abs/2402.04833

Mathematical exploration and discovery at scale, Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, Adam Zsolt Wagner, 2025. https://arxiv.org/abs/2511.02864

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration, 2025. https://arxiv.org/abs/2511.21689

From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence, 2025. https://arxiv.org/abs/2511.18538

Small Language Models are the Future of Agentic AI. Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Don, Saurav Muralidhara, Yingyan Celine Lin, Pavlo Molchanov. https://arxiv.org/pdf/2506.02153

AgentFold: Long-Horizon Web Agents with Proactive Context Management. Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, Pengjun Xie, Fei Huang, Siheng Chen, Jingren Zhou, Yong Jiang. https://arxiv.org/abs/2510.24699

ReAct: Synergizing Reasoning and Acting in Language Models. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao. https://arxiv.org/abs/2210.03629

Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean. Peiyang Song, Kaiyu Yang, Anima Anandkumar. https://arxiv.org/abs/2404.12534

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv preprint arXiv:2305.10601. https://arxiv.org/abs/2305.10601

Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. Self-Refine: Iterative Refinement with Self-Feedback. arXiv preprint arXiv:2303.17651. https://arxiv.org/abs/2303.17651

Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv preprint arXiv:2210.03629. https://arxiv.org/abs/2210.03629

Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. 2024. A survey on multimodal large language models. Natl. Sci. Rev. 11, 12 (November 2024), nwae403. DOI:https://doi.org/10.1093/nsr/nwae403

Hanjia Lyu, Jinfa Huang, Daoan Zhang, Yongsheng Yu, Xinyi Mou, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo. GPT-4V(ision) as A Social Media Analysis Engine. https://arxiv.org/abs/2311.07547

Jiacheng Miao, Joe R. Davis, Jonathan K. Pritchard, and James Zou. 2025. Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents. arXiv preprint arXiv:2509.06917. https://arxiv.org/abs/2509.06917

Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling, Annu Rana, Gaurav Kumar, 2025. https://arxiv.org/abs/2512.14474