CS 3330: Lab 9, Homeworks 9 and 10: Performance

This page does not represent the most current semester of this course; it is present merely as an archive.

Labs 9 is a working lab; there is not separate lab and homework assignments.

Homeworks 9 and 10 are both the same file and are submitted as the same file. However, they have distinct due dates. I'll take a snapshot of the submissions at homework 9's due date and use it to compute homework 9's score.

You may work with a single partner for homeworks 8 and 9, or may work alone.

The following kinds of conversations are permitted with people other than your partner or course staff:

Set Up

  1. Download perflab-handout.tar on linux.

  2. run tar xvf perflab-handout.tar to extract the tarball, creating the directory perflab_handout.

  3. Edit kernels.c; on lines 12 through 20 you'll see a team_t structure, which you should initialize with your name and email (and your partner's, if you have one), as well as what you'd like your team to be called in the scoreboard.

  4. kernels.c is the only file you should modify.

Overview

This assignment deals with optimizing memory-intensive code. Image processing offers many examples of functions that can benefit from optimization. In this lab, we will consider two image processing operations: rotate, which rotates an image counter-clockwise by 90◦, and smooth, which “smooths” or “blurs” an image.

For this lab, we will consider an image to be represented as a two-dimensional matrix M, where Mi,j denotes the value of (i, j)th pixel of M. Pixel values are triples of red, green, and blue (RGB) values. We will only consider square images. Let N denote the number of rows (or columns) of an image. Rows and columns are numbered, in C-style, from 0 to N − 1.

Rotate

Given this representation, the rotate operation can be implemented quite simply as the combination of the following two matrix operations:

This combination is illustrated in the following figure:

image of image rotation

Smooth

The smooth operation is implemented by replacing every pixel value with the average of all the pixels around it (in a maximum of 3 × 3 window centered at that pixel).

Consider the following image:

image of image smoothing

In that image, the values of pixels M2[1][1] is

22
M1[i][j]
i = 0j = 0
9

In that image, the value of M2[N-1][N-1] is

N − 1N − 1
M1[i][j]
i = N − 2j = N − 2
4

Code

Structures we give you

A pixel is defined in defs.h as

typedef struct {
    unsigned short red;
    unsigned short green;
    unsigned short blue;
} pixel;

Images are provided in flattened arrays and can be accessed by RIDX, defined as

#define RIDX(i,j,n) ((i)*(n)+(j))

by the code nameOfImage[RIDX(index1, index2, dimensionOfImage)].

All images will be square and have a size that is a multiple of 32.

What you should change

In kernel.c you will see several rotate and several smooth functions.

Driver

The source code you will write will be linked with object code that we supply into a driver binary. To create this binary, you will need to execute the command

unix> make driver

You will need to re-make driver each time you change the code in kernels.c. To test your implementations, you can then run the command:

unix> ./driver

The driver can be run in four different modes:

If run without any arguments, driver will run all of your versions (default mode). Other modes and options can be specified by command-line arguments to driver, as listed below:

Grading

Rules

Violations of the following rules will be seen as cheating, subject to penalties that extend beyond the scope of this assignment:

Additionally, the following rules will result in grade penalties within this assignment if violated:

Grading

Speedups vary wildly by the host hardware. I have scaled the grade based on my timing server's hardware so that particular strategies will get 75% and 100% scores.

Smooth and Rotate will each be weighted as a full homework assignment in gradebook.

Rotate will get 0 points for 1.0× speedups on my computer, 75% for 1.3×, and 100% for 1.8× speedups, as expressed in the following pseudocode:

if (speedup < 1.0) return MAX_SCORE * 0;
if (speedup < 1.3) return MAX_SCORE * 0.75 * (speedup - 1.0) / (1.3 - 1.0);
if (speedup < 1.8) return MAX_SCORE * (0.75 + 0.25 * (speedup - 1.3) / (1.8 - 1.3));
return MAX_SCORE;

Smooth will get 0 points for 1.0× speedups on my computer, 75% for 1.1×, and 100% for 2.0× speedups, as expressed in the following pseudocode:

if (speedup < 1.0) return MAX_SCORE * 0;
if (speedup < 1.1) return MAX_SCORE * 0.75 * (speedup - 1.0) / (1.1 - 1.0);
if (speedup < 2.0) return MAX_SCORE * (0.75 + 0.25 * (speedup - 1.1) / (2.0 - 1.1));
return MAX_SCORE;

Exact speedups may vary by computer; see the submission results page for what speedup I found from your most recent submission. Experience has shown that timing does not change by more than .03× upon resubmission.

Submission

You will submit only kernels.c, and need submit it only once (as the homework) though you probably want to submit it often to see what my server's timing results are.

Submit at the usual place. One of my machines will attempt to time your code and post the speedups I find here. Since running all of the timing for everyone's code can take a while, expect 30-60 minute delays before your results are posted. I time all of your code and report only the fastest run of each method, so it is in your interest to have several optimizations in your submission; note, however, that if your code runs for more than 5 minutes I will stop running it and post no results.

If you are in a team, it does not matter to me if one or both partners submit so long as at least one does.

Copyright © 2015 by Luther Tychonievich. All rights reserved.
Last updated 2015-03-25 10:20 -0400