Running Llama MMLU Evaluation on Google Colab

Overview

This guide walks you through running llama_mmlu_eval.py on Google Colab to evaluate Llama 3.2-1B on the MMLU benchmark using free GPU resources.

Prerequisites

  1. Google Account - For accessing Google Colab

  2. Hugging Face Account - For model access

  3. Hugging Face Token - With Llama license accepted

Step-by-Step Setup

Step 1: Access Google Colab (first time)

  1. Go to https://colab.research.google.com/

  2. Sign in with your Google account

Step 2: Create Project Folder in Google Drive

  1. Open Google Drive

  2. Open or create a folder named Colab_Projects.

  3. Open the folder and create a folder named RunningLMM (or whatever is the name of your project).

  4. Open the folder and click New (+ sign) -> More -> Google Colaboratory

  5. This creates a Python notebook. Name it "llama_mmlu_eval.ipynb".

  6. Open the notebook - this will return you to Colab.

Step 3: Connect Input/Output to Google Drive

Copy the following code into the first cell and run it. This code should appear in the first cell of every notebook, replacing "RunningLLM" with the name of your project if it is different. This will make sure all files your program creates are safely saved on Google Drive instead of disappearing when your session ends.

Step 4: Enable GPU Runtime

IMPORTANT: You must enable GPU for faster execution!

  1. Click Runtime in the menu bar

  2. Select Change runtime type

  3. In the dialog:

    • Hardware accelerator: Select GPU (T4 is free tier)

    • GPU type: T4, V100, or A100 (if available)

  4. Click Save

Verify GPU is enabled:

You should see GPU information (Tesla T4, V100, etc.)

Step 5: Enable Gemini (first time)

  1. Click on Settings (gear icon)

  2. Click on AI Assistance and check the box "Consented to use generative AI features"

Step 6: Install Claude Code (Optional)

This is optional because you could also run Claude in a separate window and copy / paste back and forth or just use Gemini built into Colab. Run this in a Colab code cell:

Then you can run Claude Code by running this in a cell:

Step 7: Install Dependencies

Copy this into a of the notebook and run it:

The -q flag makes output quieter. Remove it if you want to see installation progress. Modify this to add whatever other libraries you may need.

Step 8: Authenticate with Hugging Face

Option A: Interactive Login

When prompted:

  1. Paste your Hugging Face token

  2. Press Enter

  3. Type y when asked to save as git credential

Option B: Set Token Directly (Faster for repeated runs)

Copy this into a cell of your notebook and run it:

⚠️ Security Warning: If using Option B, do NOT share your notebook publicly with the token visible!

Step 9: Upload Your Program:

Option A: Copy-Paste Code Directly into notebook (recommended)

Create a new cell and paste in your code. This is recommended because it will make it easy for Gemini to debug and modify your code.

Option B: Upload from Your Computer

  1. Click the 📁 Files icon in the left sidebar

  2. Click 📤 Upload button

  3. Select llama_mmlu_eval.py

  4. File will appear in /content/

Step 10: Run Your Program

If you pasted the code: Just run the cell with the code.

If you uploaded the file:

Run this in a cell of the notebook:

Step 11: Monitor Progress

You'll see:

Expected runtime:

Step 12: Download Results

Your results files should appear in the Google Drive project folder you created.

You can also view and download your files directly in Colab by clicking on Files on the left sidebar, mounting Google Drive by clicking on the little Drive icon (if it is not already mounted), and then browsing to your project folder.