Known Issues: 1) Backprop The hidden layer number is hard-coded as 16. The weight matrix between input and hidden layers is a layer_size x 16 matrix. So the thread blocks are partitioned in y-dimension. If you want to change the hidden layer number to 32, e.g. layer_size x 32, you also have to change the width of the thread block to 32. In the y dimension (height), changing block size is flexible.