In this lab we will continue working with the CIFAR-10 dataset. However, we will go deeper. Adding linear layers and non-linear activations functions on top of each other. First, I will present a re-implementation of what we had last time.

This is similar to the loss_softmax and loss_softmax_backward implementations in the previous lab. Here we also make sure this works for a batch of vectors instead of a single vector. This means the input here will be a tensor of size batchSize x inputSize:

In [ ]:

```
import torch, lab_utils, random
from torchvision.datasets import CIFAR10
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.nn as nn
import torch.optim as optim
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
from PIL import Image
import json, string
%matplotlib inline
```

In [ ]:

```
# This class combines Softmax + Cross Entropy Loss.
# Similar to the previous lab, but this implementation works for batches of inputs and
# not just individual input vectors. Here the input is batchSize x inputSize.
class nn_CrossEntropyLoss(object):
# Forward pass -log softmax(input_{label})
def forward(self, inputs, labels):
max_val = inputs.max() # This is to avoid variable overflows.
exp_inputs = (inputs - max_val).exp()
# This is different than in the previous lab. Avoiding for loops here.
denominators = exp_inputs.sum(1).repeat(inputs.size(1), 1).t()
self.predictions = torch.mul(exp_inputs, 1 / denominators)
# Check what gather does. Just avoiding another for loop.
return -self.predictions.log().gather(1, labels.view(-1, 1)).mean()
# Backward pass
def backward(self, inputs, labels):
grad_inputs = self.predictions.clone()
# Ok, Here we will use a for loop (but it is avoidable too).
for i in range(0, inputs.size(0)):
grad_inputs[i][labels[i]] = grad_inputs[i][labels[i]] - 1
return grad_inputs
# Input: 4 vectors of size 10.
testInput = torch.Tensor(4, 10).normal_(0, 0.1)
# labels: 4 labels indicating the correct class for each input.
labels = torch.LongTensor([3, 4, 4, 8])
# Forward and Backward passes:
loss_softmax = nn_CrossEntropyLoss()
loss = loss_softmax.forward(testInput, labels)
gradInputs = loss_softmax.backward(testInput, labels)
```

Before continuing, make sure you understand every line of code in the above implementation by looking at previous lecture notes.

Next we provide an implementation for a linear layer that is also meant to work on batches of vetors. Notice that in addition of computing gradWeight and gradBias, we require here gradInput as we might need this gradient to do backpropagation. Making a batched implementation of this layer is easier because the only change is that now we have matrix-matrix multiplications as opposed to vector-matrix multiplications.

In [ ]:

```
class nn_Linear(object):
def __init__(self, inputSize, outputSize):
self.weight = torch.Tensor(inputSize, outputSize).normal_(0, 0.01)
self.gradWeight = torch.Tensor(inputSize, outputSize)
self.bias = torch.Tensor(outputSize).zero_()
self.gradBias = torch.Tensor(outputSize)
# Forward pass, inputs is a matrix of size batchSize x inputSize
def forward(self, inputs):
# This one needs no change, it just becomes matrix x matrix multiplication
# as opposed to just vector x matrix multiplication as we had before.
return torch.matmul(inputs, self.weight) + self.bias
# Backward pass, in addition to compute gradients for the weight and bias.
# It has to compute gradients with respect to inputs.
def backward(self, inputs, gradOutput):
self.gradWeight = torch.matmul(inputs.t(), gradOutput)
self.gradBias = gradOutput.sum(0)
return torch.matmul(gradOutput, self.weight.t())
# Input: 4 vectors of size 3072.
testInput = torch.Tensor(4, 3 * 32 * 32).normal_(0, 0.1)
dummyGradOutputs = torch.Tensor(4, 10).normal_(0, 0.1)
#Forward and Backward passes:
linear = nn_Linear(3 * 32 * 32, 10)
output = linear.forward(testInput)
gradInput = linear.backward(testInput, dummyGradOutputs)
```

Finally we need to implement some non-linear activation function. Here we will implement ReLU which is the simplest activation function but also one of the most important as we discussed during class.

In [ ]:

```
class nn_ReLU(object):
# pytorch has an element-wise max function.
def forward(self, inputs):
outputs = inputs.clone()
outputs[outputs < 0] = 0
return outputs
# Make sure the backward pass is absolutely clear.
def backward(self, inputs, gradOutput):
gradInputs = gradOutput.clone()
gradInputs[inputs < 0] = 0
return gradInputs
```

Ok, now we are ready to use our three layers to build a neural network. We will use it to classify images on CIFAR-10 as in our previous lab, but additionally we will use pytorch's DataLoaders which will build batches automatically for us, and will shuffle the data for us.

In [ ]:

```
# In addition to transforming the image into a tensor, we also normalize the values in the image
# so that the mean pixel value is subtracted and divided by the pixel standard deviation.
imgTransform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010)),
transforms.Lambda(lambda inputs: inputs.view(3 * 32 * 32))])
trainset = CIFAR10(root='./data', train = True, transform = imgTransform, download = True)
valset = CIFAR10(root='./data', train = False, transform = imgTransform, download = True)
trainLoader = torch.utils.data.DataLoader(trainset, batch_size = 128,
shuffle = True, num_workers = 0)
valLoader = torch.utils.data.DataLoader(valset, batch_size = 128,
shuffle = False, num_workers = 0)
```

Now that the dataset train, and validation splits are loaded, let's train.

In [ ]:

```
from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm
learningRate = 1e-4 # Single learning rate for this lab.
# Definition of our network.
linear1 = nn_Linear(3 * 32 * 32, 1024)
relu = nn_ReLU()
linear2 = nn_Linear(1024, 10)
criterion = nn_CrossEntropyLoss()
# Training loop.
for epoch in range(0, 10):
correct = 0.0
cum_loss = 0.0
counter = 0
# Make a pass over the training data.
t = tqdm(trainLoader, desc = 'Training epoch %d' % epoch)
for (i, (inputs, labels)) in enumerate(t):
# Forward pass:
a = linear1.forward(inputs)
b = relu.forward(a)
c = linear2.forward(b)
cum_loss += criterion.forward(c, labels)
max_scores, max_labels = c.max(1)
correct += (max_labels == labels).sum()
# Backward pass:
grads_c = criterion.backward(c, labels)
grads_b = linear2.backward(b, grads_c)
grads_a = relu.backward(a, grads_b)
linear1.backward(inputs, grads_a)
# Weight and bias updates.
linear1.weight = linear1.weight - learningRate * linear1.gradWeight
linear1.bias = linear1.bias - learningRate * linear1.gradBias
linear2.weight = linear2.weight - learningRate * linear2.gradWeight
linear2.bias = linear2.bias - learningRate * linear2.gradBias
# logging information.
counter += inputs.size(0)
t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)
# Make a pass over the validation data.
correct = 0.0
cum_loss = 0.0
counter = 0
t = tqdm(valLoader, desc = 'Validation epoch %d' % epoch)
for (i, (inputs, labels)) in enumerate(t):
# Forward pass:
a = linear1.forward(inputs)
b = relu.forward(a)
c = linear2.forward(b)
cum_loss += criterion.forward(c, labels)
max_scores, max_labels = c.max(1)
correct += (max_labels == labels).sum()
# logging information.
counter += inputs.size(0)
t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)
```

Pytorch already comes with an impressive number of operations used to implement deep neural networks. Here we will use the same ones that we already have implemented and show how similar and easy is to use pytorch's implementations. Another thing about pytorch is that we will wrap our variables in a neural network with a torch.autograd.Variable object.

In [ ]:

```
from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm
learningRate = 1e-2 # Single learning rate for this lab.
# Definition of our network.
network = nn.Sequential(
nn.Linear(3072, 1024),
nn.ReLU(),
nn.Linear(1024, 10),
)
#Definition of our loss.
criterion = nn.CrossEntropyLoss()
# Definition of optimization strategy.
optimizer = optim.SGD(network.parameters(), lr = learningRate)
def train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 10):
# Training loop.
for epoch in range(0, n_epochs):
correct = 0.0
cum_loss = 0.0
counter = 0
# Make a pass over the training data.
t = tqdm(trainLoader, desc = 'Training epoch %d' % epoch)
network.train() # This is important to call before training!
for (i, (inputs, labels)) in enumerate(t):
# Wrap inputs, and targets into torch.autograd.Variable types.
inputs = Variable(inputs)
labels = Variable(labels)
# Forward pass:
outputs = network(inputs)
loss = criterion(outputs, labels)
# Backward pass:
optimizer.zero_grad()
# Loss is a variable, and calling backward on a Variable will
# compute all the gradients that lead to that Variable taking on its
# current value.
loss.backward()
# Weight and bias updates.
optimizer.step()
# logging information.
cum_loss += loss.data[0]
max_scores, max_labels = outputs.data.max(1)
correct += (max_labels == labels.data).sum()
counter += inputs.size(0)
t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)
# Make a pass over the validation data.
correct = 0.0
cum_loss = 0.0
counter = 0
t = tqdm(valLoader, desc = 'Validation epoch %d' % epoch)
network.eval() # This is important to call before evaluating!
for (i, (inputs, labels)) in enumerate(t):
# Wrap inputs, and targets into torch.autograd.Variable types.
inputs = Variable(inputs)
labels = Variable(labels)
# Forward pass:
outputs = network(inputs)
loss = criterion(outputs, labels)
# logging information.
cum_loss += loss.data[0]
max_scores, max_labels = outputs.data.max(1)
correct += (max_labels == labels.data).sum()
counter += inputs.size(0)
t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)
# Train the previously defined model.
train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 10)
```

Objects of type torch.autograd.Variable contain two attributes .data and .grad, the first one, .data, contains the value of the variable at any given point, and .grad contains the value of the gradient of this variable once a backward call involving this variable has been invoked. In the previous code, we have to take into account that most torch tensor operations that can be applied to tensors, can also be applied to tensors wrapped into torch.autograd.Variables. The output of torch operations involving variables will also be a torch.autograd.Variable (as opposed to just a tensor). Another difference is that pytorch will record the operations on each torch.autograd.Variable in a graph structure so that gradients can be computed when a backward() call is performed on any variable in the graph. This very powerful technique is often called "automatic differentiation". This means that as long as we wrap tensors in variables, and use pytorch operators, we do not really need to implement backward passes.

In [ ]:

```
from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
learningRate = 1e-2 # Single learning rate for this lab.
class MyAutogradModel(nn.Module):
def __init__(self):
super(MyAutogradModel, self).__init__()
# See documentation for nn.Parameter here:
# https://github.com/pytorch/pytorch/blob/master/torch/nn/parameter.py
self.weight1 = nn.Parameter(torch.Tensor(3072, 1024).normal_(0, 0.01))
self.bias1 = nn.Parameter(torch.Tensor(1024).zero_())
self.weight2 = nn.Parameter(torch.Tensor(1024, 10).normal_(0, 0.01))
self.bias2 = nn.Parameter(torch.Tensor(10).zero_())
# No need to implement backward when using torch.autograd.Variable and pytorch functions.
# Think of the possibilities!
def forward(self, inputs):
x = F.relu(torch.matmul(inputs, self.weight1) + self.bias1)
x = torch.matmul(x, self.weight2) + self.bias2
return x
# Definition of our network.
network = MyAutogradModel()
#Definition of our loss.
criterion = nn.CrossEntropyLoss()
# Definition of optimization strategy.
optimizer = optim.SGD(network.parameters(), lr = learningRate)
# Train the previously defined model.
train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 10)
```

In this section we will use convolutional layers in addition to linear layers. Convolutional layers work on 2D input so we will modify our data loaders so that they return 2D images instead of the flattened array versions of the images that we have been using thus far.

In [ ]:

```
# Same transformations as before but we do not vectorize the images.
imgTransform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010))])
trainset = CIFAR10(root='./data', train = True, transform = imgTransform)
valset = CIFAR10(root='./data', train = False, transform = imgTransform)
trainLoader = torch.utils.data.DataLoader(trainset, batch_size = 128,
shuffle = True, num_workers = 0)
valLoader = torch.utils.data.DataLoader(valset, batch_size = 128,
shuffle = False, num_workers = 0)
```

In [ ]:

```
from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm
learningRate = 1e-2 # Single learning rate for this lab.
# LeNet is French for The Network, and is taken from Yann Lecun's 98 paper
# on digit classification http://yann.lecun.com/exdb/lenet/
# This was also a network with just two convolutional layers.
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
# Convolutional layers.
self.conv1 = nn.Conv2d(3, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# Linear layers.
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
# This flattens the output of the previous layer into a vector.
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
# Definition of our network.
network = LeNet()
#Definition of our loss.
criterion = nn.CrossEntropyLoss()
# Definition of optimization strategy.
optimizer = optim.SGD(network.parameters(), lr = learningRate)
# Train the previously defined model.
train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 20)
```

In [ ]:

```
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
un_normalize = lab_utils.UnNormalize((0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010))
network.eval() # Important!
# Now predict the category using this trained classifier
for i in range(0, 5):
img_id = random.randint(0, 10000)
print('Image %d' % img_id)
img, _ = valset[img_id]
predictions = F.softmax(network(Variable(img.unsqueeze(0))))
predictions = predictions.data
# Show the results of the classifier.
lab_utils.show_image(lab_utils.tensor2pil(un_normalize(img)).resize((128, 128)));
max_score, max_label = predictions.max(1)
print('Image predicted as %s with confidence %.2f' % (classes[max_label[0]], max_score[0]))
# Print out detailed predictions.
for (i, pred) in enumerate(predictions.squeeze().tolist()):
print('y_hat[%s] = %.2f' % (classes[i], pred))
```

Pytorch has several pretrained Convnet models in the Imagenet Large Scale Visual Recognition Challenge (ILSVRC) dataset. The ILSVRC task contains more than 1 million images in the training set, and the number of labels is 1000. Training a Convnet on this dataset takes often weeks on arrays of GPUs. Let's load one of such networks with 18 layers of depth, and try it in some images. Look below at how impressive is this neural network with so many layers and groups of layers, however most layers are still ReLU, Conv2d, and BatchNorm2d, with a few MaxPool2d, and one AvgPool2d and Linear at the end. There are also Resnet versions of depth size 34, 50, 101, and 152.

In [ ]:

```
resnet = models.resnet18(pretrained = True)
print(resnet)
```

In [ ]:

```
# 1. Define the appropriate image pre-processing function.
preprocessFn = transforms.Compose([transforms.Scale(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean = [0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
# 2. Load the imagenet class names.
imagenetClasses = {int(idx): entry[1] for (idx, entry) in json.load(open('imagenet_class_index.json')).items()}
# 3. Forward a test image of the toaster.
# Never forget to set in evaluation mode so Dropoff layers don't add randomness.
resnet.eval()
# unsqueeze(0) adds a dummy batch dimension which is required for all models in pytorch.
image = Image.open('test_image.jpg').convert('RGB')
# Try your own image here. This is a picture of my toaster at home.
inputVar = Variable(preprocessFn(image).unsqueeze(0))
predictions = resnet(inputVar)
# 4. Decode the top 10 classes predicted for this image.
# We need to apply softmax because the model outputs the last linear layer activations and not softmax scores.
probs, indices = (-F.softmax(predictions)).data.sort()
probs = (-probs).numpy()[0][:10]; indices = indices.numpy()[0][:10]
preds = [imagenetClasses[idx] + ': ' + str(prob) for (prob, idx) in zip(probs, indices)]
# 5. Show image and predictions
plt.title(string.join(preds, '\n'))
plt.imshow(image);
```

We will now use a pretrained network known as Alexnet on CIFAR-10 data, however there is a problem which is that Alexnet takes images in 224x224 resolution, and CIFAR-10 images are 32x32. So we will scale-up images in CIFAR-10 so that they work with Alexnet.

In [ ]:

```
# Same transformations as before but we do not vectorize the images.
# Additionally we are scaling up images to 224x224 in order to use Resnet.
imgTransform = transforms.Compose([transforms.Scale((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465),
(0.2023, 0.1994, 0.2010))])
trainset = CIFAR10(root='./data', train = True, transform = imgTransform)
valset = CIFAR10(root='./data', train = False, transform = imgTransform)
trainLoader = torch.utils.data.DataLoader(trainset, batch_size = 64,
shuffle = True, num_workers = 0)
valLoader = torch.utils.data.DataLoader(valset, batch_size = 64,
shuffle = False, num_workers = 0)
```

In [ ]:

```
from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm
learningRate = 1e-3 # Single learning rate for this lab.
# Definition of our network.
network = models.alexnet(pretrained = True)
# Also notice I'm replacing the classifier which originally has 3 linear layers
# into a classifier that is just a single layer.
network.classifier = nn.Linear(9216, 10) # CIFAR-10 has 10 classes not 1000.
#Definition of our loss.
criterion = nn.CrossEntropyLoss()
# Definition of optimization strategy.
optimizer = optim.SGD(network.parameters(), lr = learningRate)
def train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 10, use_gpu = False):
if use_gpu:
network = network.cuda()
criterion = criterion.cuda()
# Training loop.
for epoch in range(0, n_epochs):
correct = 0.0
cum_loss = 0.0
counter = 0
# Make a pass over the training data.
t = tqdm(trainLoader, desc = 'Training epoch %d' % epoch)
network.train() # This is important to call before training!
for (i, (inputs, labels)) in enumerate(t):
# Wrap inputs, and targets into torch.autograd.Variable types.
inputs = Variable(inputs)
labels = Variable(labels)
if use_gpu:
inputs = inputs.cuda()
labels = labels.cuda()
# Forward pass:
outputs = network(inputs)
loss = criterion(outputs, labels)
# Backward pass:
optimizer.zero_grad()
# Loss is a variable, and calling backward on a Variable will
# compute all the gradients that lead to that Variable taking on its
# current value.
loss.backward()
# Weight and bias updates.
optimizer.step()
# logging information.
cum_loss += loss.data[0]
max_scores, max_labels = outputs.data.max(1)
correct += (max_labels == labels.data).sum()
counter += inputs.size(0)
t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)
# Make a pass over the validation data.
correct = 0.0
cum_loss = 0.0
counter = 0
t = tqdm(valLoader, desc = 'Validation epoch %d' % epoch)
network.eval() # This is important to call before evaluating!
for (i, (inputs, labels)) in enumerate(t):
# Wrap inputs, and targets into torch.autograd.Variable types.
inputs = Variable(inputs)
labels = Variable(labels)
if use_gpu:
inputs = inputs.cuda()
labels = labels.cuda()
# Forward pass:
outputs = network(inputs)
loss = criterion(outputs, labels)
# logging information.
cum_loss += loss.data[0]
max_scores, max_labels = outputs.data.max(1)
correct += (max_labels == labels.data).sum()
counter += inputs.size(0)
t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)
# Train the previously defined model.
train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 5, use_gpu = True)
```

1) [2pts] In section 3 of this lab we implemented the ReLU activation function, and used it to train a two-layer neural network. Here please implement Sigmoid, and TanH:

$$\text{Sigmoid(x)} = \frac{1}{1 + e^{-x}} = \frac{e^x}{e^x + 1}$$$$\text{Tanh(x)} = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$$In [ ]:

```
# Sigmoid of x.
class nn_Sigmoid:
def forward(self, x):
# Forward pass.
pass
def backward(self, x, gradOutput):
# Backward pass
pass
# Hyperbolic tangent.
class nn_Tanh:
def forward(self, x):
# Forward pass.
pass
def backward(self, x, gradOutput):
# Backward pass
pass
```

2) [1pts] Our ReLU function makes things zero when they are less than zero. This is still the most widely used activation function used today but a variante called LeakyReLU has been proposed where a linear function close to zero is used instead. Here is the definition:

$$ \text{LeakyReLU}(x) = \begin{cases} \beta x & x < 0 \\ x & x \geq 0 \end{cases}$$where $\beta$ is usally a small value e.g. $\beta = 0.3$

In [ ]:

```
# Sigmoid of x.
class nn_LeakyReLU:
def __init__(self, beta = 0.3):
self.beta = beta
def forward(self, x):
# Forward pass.
pass
def backward(self, x, gradOutput):
# Backward pass
pass
```

In [ ]:

```
from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm
learningRate = 1e-2 # Feel free to change this.
# You can use LeNet as the starting point.
# You can do things such as adding more layers,
# adding more filters to the existing layers,
# adding things such as BatchNormalization, Dropout, etc.
# anything you want, but add references if you consult something online.
class MyNetwork(nn.Module):
def __init__(self):
super(MyNetwork, self).__init__()
# Convolutional layers.
self.conv1 = nn.Conv2d(3, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
# Linear layers.
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
# This flattens the output of the previous layer into a vector.
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
# Definition of our network.
network = MyNetwork()
# Feel free to use a different loss here.
criterion = nn.CrossEntropyLoss()
# Feel free to change the optimizer, or the optimizer parameters. e.g. momentum, weightDecay, etc.
optimizer = optim.SGD(network.parameters(), lr = learningRate)
# Train the previously defined model.
train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 20)
```

In [ ]:

```
```

1) [3pts] For Q4 you get extra points if you use Resnet as in Section 7 but replace the fc layer at the end so that the model only predicts two variables (cat and dog). You will have to then re-train Resnet in this dataset. The idea is to use a model that has already been pre-trained on large task (ILSVRC), and re-train it (often called fine-tuning), on a smaller dataset. Present your code for the model, training output, plots, and example classifications on a few validations set images. Note: If you provide a model that does this in Q4, you directly get awarded 7pts in Q4 but for clarity provide the solution here instead if you plan to do this. Keep in mind that re-training Resnet on 20,000 images will probably still require GPU computing, and some significant computing time so start this early.

In [ ]:

```
```

In [ ]:

```
```

If you find any errors or omissions in this material please contact me at vicente@virginia.edu