Running Llama MMLU Evaluation on Google Colab

Overview

This guide walks you through running llama_mmlu_eval.py on Google Colab to evaluate Llama 3.2-1B on the MMLU benchmark using free GPU resources.

Prerequisites

Google Account - For accessing Google Colab
Hugging Face Account - For model access
Hugging Face Token - With Llama license accepted

Step-by-Step Setup

Step 1: Access Google Colab (first time)

Go to https://colab.research.google.com/
Sign in with your Google account

Step 2: Create Project Folder in Google Drive

Open Google Drive
Open or create a folder named Colab_Projects.
Open the folder and create a folder named RunningLMM (or whatever is the name of your project).
Open the folder and click New (+ sign) -> More -> Google Colaboratory
This creates a Python notebook. Name it "llama_mmlu_eval.ipynb".
Open the notebook - this will return you to Colab.

Step 3: Connect Input/Output to Google Drive

Copy the following code into the first cell and run it. This code should appear in the first cell of every notebook, replacing "RunningLLM" with the name of your project if it is different. This will make sure all files your program creates are safely saved on Google Drive instead of disappearing when your session ends.


x
# 1. Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# 2. Create and move into a specific project folder
import os
project_folder = '/content/drive/MyDrive/Colab_Projects/RunningLMM'

if not os.path.exists(project_folder):
    os.makedirs(project_folder)
    print(f"Created folder: {project_folder}")

# 3. Change the working directory to this folder
os.chdir(project_folder)
print(f"Current Directory: {os.getcwd()}")

Step 4: Enable GPU Runtime

IMPORTANT: You must enable GPU for faster execution!

Click Runtime in the menu bar
Select Change runtime type
In the dialog:
- Hardware accelerator: Select GPU (T4 is free tier)
- GPU type: T4, V100, or A100 (if available)
Click Save

Verify GPU is enabled:


xxxxxxxxxx
# Run this in a cell
!nvidia-smi

You should see GPU information (Tesla T4, V100, etc.)

Step 5: Enable Gemini (first time)

Click on Settings (gear icon)
Click on AI Assistance and check the box "Consented to use generative AI features"

Step 6: Install Claude Code (Optional)

This is optional because you could also run Claude in a separate window and copy / paste back and forth or just use Gemini built into Colab. Run this in a Colab code cell:


xxxxxxxxxx
!curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - && \
sudo apt-get install -y nodejs && \
sudo npm install -g @anthropic-ai/claude-code && \
export PATH=/usr/bin:$PATH

Then you can run Claude Code by running this in a cell:


xxxxxxxxxx
!claude

Step 7: Install Dependencies

Copy this into a of the notebook and run it:


xxxxxxxxxx
# Install required packages
!pip install -q transformers torch datasets accelerate tqdm huggingface_hub bitsandbytes

The -q flag makes output quieter. Remove it if you want to see installation progress. Modify this to add whatever other libraries you may need.

Step 8: Authenticate with Hugging Face

Option A: Interactive Login


xxxxxxxxxx
# Login to Hugging Face
!hf auth login

When prompted:

Paste your Hugging Face token
Press Enter
Type y when asked to save as git credential

Option B: Set Token Directly (Faster for repeated runs)

Copy this into a cell of your notebook and run it:


xxxxxxxxxx
import os
os.environ['HF_TOKEN'] = 'hf_YourTokenHere'  # Replace with your token

⚠️ Security Warning: If using Option B, do NOT share your notebook publicly with the token visible!

Step 9: Upload Your Program:

Option A: Copy-Paste Code Directly into notebook (recommended)

Create a new cell and paste in your code. This is recommended because it will make it easy for Gemini to debug and modify your code.

Option B: Upload from Your Computer

Click the 📁 Files icon in the left sidebar
Click 📤 Upload button
Select llama_mmlu_eval.py
File will appear in /content/

Step 10: Run Your Program

If you pasted the code: Just run the cell with the code.

If you uploaded the file:

Run this in a cell of the notebook:


xxxxxxxxxx
!python llama_mmlu_eval.py

Step 11: Monitor Progress

You'll see:

Model loading progress
Subject-by-subject evaluation with progress bars
Real-time accuracy per subject
Final summary with top/bottom subjects

Expected runtime:

With T4 GPU: ~30-60 minutes for all 57 subjects
With V100/A100: ~15-30 minutes
CPU: Several hours (not recommended)

Step 12: Download Results

Your results files should appear in the Google Drive project folder you created.

You can also view and download your files directly in Colab by clicking on Files on the left sidebar, mounting Google Drive by clicking on the little Drive icon (if it is not already mounted), and then browsing to your project folder.