Thursday, October 6, 2016

A mathematical definition of unsupervised learning?

Extracting structure from data is considered by many to be the frontier of machine learning. Yet even defining "structure", or the very goal of learning structure from unlabelled data, is not well understood or rigorously defined.

In this post we'll give a little background to unsupervised learning and argue that, compared to supervised learning, we lack a well-founded theory to tackle even the most basic questions. In the next post, we'll introduce a new candidate framework.

Unsupervised learning is a commonplace component of many undergraduate machine learning courses and books. At the core, the treatment of unsupervised learning is a collection of methods for analyzing data and extracting hidden structure from this data.  Take for example John Duchi's "Machine Learning" course at Stanford. Under "Unsupervised learning", we have the topics: Clustering, K-means, EM. Mixture of Gaussians, Factor analysis, PCA (Principal components analysis), ICA (Independent components analysis). This is an excellent representation of the current state-of-the-art:

  • Clustering and K-means:  by grouping "similar" elements together, usually according to some underlying metric, we can create artificial "labels" for data elements in hope that this rough labelling will be of future use, either in a supervised learning task on the same dataset, or in "understanding" the data. K-means is a widely used algorithm for finding k representative centers of the data, allowing for k "labels".
  • Mixture of Gaussians: an exemplary and widely used method stemming from the generative approach to unsupervised learning. This approach stipulates that the data is generated from a distribution, in this case a mixture of normal random variables. There are numerous algorithms for finding the centers of these Gaussians, thereby learning the structure of the data.
  • Factor analysis, PCA and ICA: the spectral approach to unsupervised learning is perhaps the most widely used, in which the covariance matrix of the data is approximated by the largest few singular vectors. This type of clustering is widely successful in text (word embeddings) and many other forms of data. 

The above techniques are widely used and successful in practice, and under suitable conditions polynomial-time algorithms are known. The common thread amongst them is finding structure in the data, which turns out for many practical applications to be useful in a variety of supervised learning tasks.

Notable unsupervised learning techniques that are missing from the above list are Restricted Boltzman machines (RBMs), Autoencoders and Dictionary Learning (which was conceived in the context of deep neural networks). The latter techniques attempt to find a succinct representation of the data. Various algorithms and heuristics for RBMs, Autotencoders and Dictionary Learning exist, but these satisfy at least one of the following:

1) come without rigorous performance guarantees.

2) run in exponential time in the worst case.

3) assume strong generative assumptions about the input.

Recent breakthroughs in polynomial time unsupervised learning due to Sanjeev and his group address points (1) & (2) above and require only (3). Independently the same is also achievable by the method of moments, see e.g. this paper, originally from Sham's group @ MSR New England, and many more recent advances. What's the downside of this approach? The Arora-Hazan debate over this point, which the theory-ML students are exposed to in our weekly seminar, is subject for a longer post...

What is missing then? Compare the above syllabus with that of supervised learning, in most undergrad courses and books, the difference in terms of theoretical foundations is stark. For supervised learning  we have Vapnik's statistical learning theory -  a deep and elegant mathematical theory that classifies exactly the learnable from labeled real-valued examples. Valiant's computational learning theory adds the crucial element of computation, and over the combined result we have hundreds of scholarly articles describing in detail problems that are learnable efficiently / learnable but inefficiently /  improperly learnable / various other notions.

Is there meaning, in pure mathematical sense such as Vapnik and Valiant instilled into supervised learning, to the unsupervised world?

I like to point out in my ML courses that computational learning theory say something philosophical about science: the concept of "learning", or a "theory" such as a physical theory of the world, has a precise mathematical meaning independent of humans. While many of us scientists seek a "meaningful" theory in the human sense, there need not be one! It could very well be that a physical theory, for example that of condensed matter, has a "theory" in the Vapnik-Valiant sense, but not one that would be as "meaningful" and "elegant" in the Feynman sense.

How can we then, give mathematical meaning to unsupervised learning in a way that:

  1. Allows natural notions of generalization from examples
  2. Improves future supervised learning tasks regardless of their nature
  3. Allows for natural family of algorithms (hopefully but not necessarily - based on convex optimization)

This will be the starting point for our next post...


  1. I am really looking forward to the post:)
    In the meanwhile, let me point out one work that I happen to know about as I was part of it (sorry for the plug, I could not help it).
    So what is happening in this work? Well, we study ICA *without* any generative assumptions. I think what we do can be done in many unsupervised settings, but we choose ICA as it is a setting that seems to be closely tied to generative and stochastic assumptions, so in some sense it looked especially challenging. So what do we do? We derive performance bounds for some algorithm (building on previous work, including Arora's) in terms of how close the data that the algorithm sees is to "ideal data". I find this approach quite satisfactory as finally we don't need to make any generative/stochastic assumptions, and the bounds tell one exactly what we need: how performance (recovery of some mixing matrix in this case) will be impacted by deviations from the ideal situation. As a bonus, one can show that the bounds can also recover the usual bounds available in the generative setting. Details are here:
    Huang, R., György, A., and Szepesvári, Cs., Deterministic Independent Component Analysis, ICML, pp. 2521--2530, 2015.

    1. Thanks Csaba!
      Your paper looks cool and in the direction of changing the statistical assumptions of ICA to some form of "closeness" to a signal in standard input-ICA form. This is certainly in the correct direction of removing statistical assumptions.
      What I'm arguing for is even more extreme: can we "step out of the model" (ICA in this case) completely, regardless of any special form of the input attain worst-case guarantees? The analogy would be to perform low-rank matrix completion (usually studied under uniform distribution over the inputs and incoherence assumptions) by low-trace (or low max-norm) relaxations.
      More to come soon...

    2. Cool:) Looking forward to it (and I see the followup will have a followup:))

  2. Interesting post. This reminds me of the joke "Classification of mathematical problems as linear and nonlinear is like classification of the Universe as bananas and non-bananas". Like nonlinear math, unsupervised learning is a very large class of loosly connected ideas so I am intrigued as to what can be said about it.

    Given that, even mathematically formulating sub-problems like clustering is hard. While some work has been done it only captures part of that we consider clustering.

  3. This comment has been removed by the author.

  4. Fantastic blog.Really thank you! I’ll apt to be again to study much more, thank you that information. Cool.Fantastic post.
    best autoketing, Shopify free shipping bar app, shopify app free shipping bar

  5. .I am trustworthy identical best work from you when you need it at the same time . I'll make sure to bookmark it and return to peruse a greater amount of your helpful information., pop on sale, sales pop by autoketing

  6. This website basically beautiful points and things right. I really appreciate to meet to it and i emphasize to this blog.This is one wonderful blog article.Much thanks again.
    facebook support chat, live chat on facebook, autoketing app

  7. autoketing
    email with love download
    email with love by autoketing
    I feel good when reading your posts. Those are what I want to know. Thank you for introducing useful information to us

  8. I guess you are a smart cookie because of your knowledge and insight.Your posts are the flower of the flock.I believe that many people like me wait for your next posts.autoketing shopify app free shipping bar Shopify free shipping bar app

  9. I just want to let you know that I just check out your site and I find it very interesting and informative. I can't wait to read lots of your posts.Excellent information on your blog, thank you for taking the time to share with us. Amazing insight you have on this, it's nice to find a website that details so much information about different artists.
    abcya3 Games for boys motox3m3 online games gogy for school

  10. I recently found many useful information in your website especially this blog page. Among the lots of comments on your articles. Thanks for sharing.
    Free Crime city 3d 2 games
    Games Flyordieio
    The story of cinderella Games unblocked

  11. Your articles are very useful. I really like them. It's very kind of you to share your insight.They give me effective tips. Thank you for sharing your insight.
    play Battle for the Galaxy Break The Cup For Free Game Toon Cup 2017

  12. The site is very interesting, you made some compelling remarks and the matter is on point. I’d choose to make use of some with the content on my blog whether or not you don’t mind.Fantastic.
    Tomb Runner free game Mommy Elsa Baby Caring best gameBlocky Craft Police Squad

  13. Xmas magic tiles online game
    Truck physics games for girls
    Ice cream maker game online
    Great post. and great website. Thanks for the information! This is my first time i visit here and I found so many interesting stuff in your blog especially it's discussion. thank you.

  14. Elsa breakup drama best online game
    Fire truck crazy race games for boy
    Unblock the ball games online
    What a blog post!! Very informative and also easy to understand. Looking for more such comments!! Do you have a facebook? I recommended it on digg. The only thing that it’s missing is a bit of new design.