Upcoming release: - reference outputs for benchmark result validation - capability of choosing the OpenCL platform and device - configurable number of iterations for the main computation part to adjust execution time - improvement and fixes for certain OpenMP codes - removal of unnecessary syncthreads for CUDA and OpenCL codes