## Thursday, March 1, 2018

### unsupervised learning III: worst-case compression-based metrics that generalize

This post is a long-overdue sequel to parts I and II of previous posts on unsupervised learning.  The long time it took me to publish this one is not by chance - I am still ambivalent on the merit, or lack of, for the notion of a worst-case PAC/statistical learnability (or, god forbid, regret minimization) without supervision.

And yet,
1. I owe a followup to the stubborn few readers of the blog who have made it thus far.
2. The algorithms we'll discuss are mathematically correct, and even if completely useless in practice, can serve as amusement to us, fans of regret minimization in convex games.
3. I really want to get on to more exciting recent developments in control that have kept me busy at night...

But first, a new feature of our blog: we'll accompany each post with a piece of music, mostly Israeli. This week, a tribute to my favorite humus restaurant in Israel:

## Thursday, June 8, 2017

### Hyperparameter optimization and the analysis of boolean functions

I've always admired the beautiful theory on the analysis of boolean functions (see the excellent book of Ryan O'Donnell, as well Gil Kalai's lectures). Heck, it used to be my main area of investigation back in the early grad-school days we were studying hardness of approximation, the PCP theorem, and the "new" (back at the time) technology of Fourier analysis for boolean functions.  This technology also gave rise to seminal results on learnability with respect to the uniform distribution. Uniform distribution learning has been widely criticized as unrealistic, and the focus of the theoretical machine learning community has shifted to (mostly) convex, real-valued, agnostic learning in recent years.

It is thus particularly exciting to me that some of the amazing work on boolean learning can be used for the very practical problem of Hyperparameter Optimization (HO), which has on the face of it nothing to do with the uniform distribution. Here is the draft, joint work with Adam Klivans and Yang Yuan.

## Saturday, October 29, 2016

### Unsupervised Learning II: the power of improper convex relaxations

In this continuation post I'd like to bring out a critical component of learning theory that is essentially missing in today's approaches for unsupervised learning: improper learning by convex relaxations.

Consider the task of learning a sparse linear separator. The sparsity, or $\ell_0$, constraint is non-convex and computationally hard. Here comes the Lasso - replace the $\ell_0$ constraint with $\ell_1$ convex relaxation --- et voila --- you've enabled polynomial-time solvable convex optimization.

## Thursday, October 6, 2016

### A mathematical definition of unsupervised learning?

Extracting structure from data is considered by many to be the frontier of machine learning. Yet even defining "structure", or the very goal of learning structure from unlabelled data, is not well understood or rigorously defined.

In this post we'll give a little background to unsupervised learning and argue that, compared to supervised learning, we lack a well-founded theory to tackle even the most basic questions. In the next post, we'll introduce a new candidate framework.

## Sunday, July 31, 2016

### Faster Than SGD 1: Variance Reduction

SGD is well-known for large-scale optimization. In my mind, there are so-far two fundamentally different improvements since the original introduction of SGD: (1) variance reduction, and (2) acceleration. In this post I'd love to conduct a survey regarding (1), and I'd like to especially thank those ICML'16 participants who pushed me to write this post.

## Wednesday, July 6, 2016

### More than a decade of online convex optimization

This nostalgic post is written after a tutorial in ICML 2016 as a recollection of a few memories with my friend Satyen Kale.

## Saturday, June 4, 2016

### How to solve classification and regression fast, faster, and fastest

(post by Zeyuan Allen-Zhu)

I am often asked what is the best algorithm to solve SVM, to solve Lasso Regression, to solve Logistic Regression, etc. At the same time, a growing number of first-order methods have been recently proposed, making it hard to track down the state-of-the-arts. I feel it perhaps a good idea to have a blog post to answer all these questions properly and simultaneously.