The learning process in deep neural networks requires the computation of the full gradient of the loss function with respect to the model parameters. The large number of parameters makes back-propagation computationally costly. MeProp is a method that was recently introduced and which aims at overcoming this issue by sparsifying the gradient vectors used in the matrix-matrix multiplications due to the chain rule used to compute gradients. The authors found that not only a larger number of training iterations is not needed, but also that the accuracy of the results is slightly improved, making it a regularization technique for improving the generalization.
First compare the method with other regularization techniques such as Dropout. Second, experiment different improvements or extensions that can be brought to MeProp (for instance, non-constant k over the training epochs, top-k selection vs random-k selection, implementing MeProp to CNNs, etc.).