Lecture Notes: Basic Image Processing

Before jumping to build powerful and intelligent models for visual recognition it is always important to look at some pixels. Looking at images and pixels and transforming them in various ways gives us often valuable intuitions on how to find things about images.

In [175]:
torch.setdefaulttensortype('torch.FloatTensor') -- use floats as the default data type.
image = require 'image' -- load the torch image library.

-- Download image using this line or manually from the following url, or use any other image.
os.execute('wget http://www.cs.virginia.edu/~vicente/images/google_android.jpg');
rgb_image = image.load('google_android.jpg')
itorch.image(rgb_image)

The rgb_image variable contains a FloatTensor of size channels x height x width corresponding to the dimensions of the image. Each entry is between 0 and 1.

In [56]:
print('Number of channels: ' .. rgb_image:size(1))
print('Image height: ' .. rgb_image:size(2))
print('Image width: ' .. rgb_image:size(3))
print('Tensor type: ' .. torch.type(rgb_image))
print('Max value: ' .. rgb_image:max())
print('Min value: ' .. rgb_image:min())
Out[56]:
Number of channels: 3	
Image height: 416	
Image width: 600	
Tensor type: torch.FloatTensor	
Max value: 1	
Min value: 0	

1. Image channels

We can slice the image into each R, G, and B channels and show them separately:

In [57]:
local red_image = rgb_image[{{1}, {}, {}}]
local green_image = rgb_image[{{2}, {}, {}}]
local blue_image = rgb_image[{{3}, {}, {}}]

itorch.image({red_image, green_image, blue_image})