Heart Wall Tracking presented here is the final stage in the Heart Wall application. The Heart Wall application tracks the movement of a mouse heart over a sequence of 100 609x590 ultrasound images to record response to the stimulus. In its initial stage, the program performs image processing operations on the first image to detect initial, partial shapes of inner and outer heart walls. These operations include: edge detection, SRAD despeckling (also part of Rodinia suite), morphological transformation and dilation. In order to reconstruct approximated full shapes of heart walls, the program generates ellipses that are superimposed over the image and sampled to mark points on the heart walls (Hough Search). In its final stage (Heart Wall Tracking presented here), program tracks movement of surfaces by detecting the movement of image areas under sample points as the shapes of the heart walls change throughout the sequence of images.
The tracking part of the application consists of multiple nested loops that process batches of sample points from the image. There is a sequential dependency between processed frames. The workload consist of a large number of small serial steps with interleaved control statements. Each of the steps involves a small amount of computation performed only on a subset of entire image. Multi-threaded version of code running on a quad-core processor achieves over 4x speedup compared to single-threaded version. Partitioning of the working set between caches and avoiding of cache-trashing contribute to the performance. When running the code in GPU, the hardware is underutilized because of the limited amount of computation at each computation step. Also the GPU overhead (data transfer and kernel launch) are significant. In order to provide significant speedup (15x), more drastic GPU optimization techniques that sacrificed modularity (in order to include code in one kernel call) were required. These techniques also combined unrelated functions and data transfers in single kernels.
For more information, see: L.G. Szafaryn, K. Skadron and J. Saucerman. "Experiences Accelerating MATLAB Systems Biology Applications." in Workshop on Biomedicine in Computing (BiC) at International Symposium on Computer Architecture (ISCA), June 2009. http://www.cs.virginia.edu/~lgs9a/publications/isca_bic_09.pdf