I think I did everything required except that my acceleration structure actually slows down the program compared to the builtin triangle mesh. This is definitely because I am doing two triangle intersections in each voxel, and I need to do a quick min/max test to stop this, but my implementation of that didn't work.
I think I should get 6 stars (at least 5.5) :)
This is the 1000x1000 sea heightfield
This is the 64x64 sea heightfield
I used barycentric coordinates to interpolate normals and texture coordinates. The entire algorithm is fairly straightforward and most of the code is taken from somewhere else in lrt.
Here are some timing results. They will be much better after the min/max test is done, but for now, I'm getting about 50% slowdown.
Here are the rib files and input tiffs.