Sunday, July 31, 2016

Faster Than SGD 1: Variance Reduction

SGD is well-known for large-scale optimization. In my mind, there are so-far two fundamentally different improvements since the original introduction of SGD: (1) variance reduction, and (2) acceleration. In this post I'd love to conduct a survey regarding (1), and I'd like to especially thank those ICML'16 participants who pushed me to write this post.

Wednesday, July 6, 2016