Machine Learning Ph.D. Seminar

Spring 2018 Schedule

January 30 - Sebastien Bubeck, Microsoft Research
Title: k-server and metrical task systems on trees

Feb 27 - Chicheng Zhang, Microsoft Research
Title: Computationally and Statistically Efficient Active Learning of Linear Separators

March 6 - Alina Beygelzimer, Yahoo! Research
Title: Practical Agnostic Active Learning [slides]

March 13 - Spring break, no seminar.

March 20 - Elad Hazan, Princeton University
Title: Taking Control by Convex Optimization

April 3 - Samory Kpotufe, Princeton University
Title: Some Recent Insights on Transfer-Learning in Nonparametric Settings

April 10 - Satyen Kale, Google Research
Title: Parameter-Free Online Learning via Model Selection

April 24 - Claudio Gentile, Google Research and INRIA (Lille, France)
Title: Leveraging Structure in Nonstochastic Bandit Problems: Some Examples [slides]


Sebastian Bubeck: k-server and metrical task systems on trees

In the last decade the mirror descent strategy of Nemirovski and Yudin has emerged as a powerful tool for online learning. I will argue that in fact the reach of this technique is far broader than expected, and that it can be useful for online computation problems in general. I will illustrate this on two classical problems in online decision making, the k-server problem and its generalization to metrical task systems. Both problems have long-standing conjectures about the optimal competitive ratio in arbitrary metric spaces, namely O(log(k)) for k-server and O(log(n)) for MTS. We will show that mirror descent, with a certain multiscale entropy regularization, yields respectively O(log^2(k)) and O(log(n)) for a very general class of metric spaces (namely hierarchically separated trees, which in particular implies the same bounds up to an additional log(n) factor for arbitrary metric spaces).

Joint work with Michael B. Cohen, James R. Lee, Yin Tat Lee, and Aleksander Madry.

Chicheng Zhang: Computationally and Statistically Efficient Active Learning of Linear Separators

Active learning is a machine learning paradigm that aims at reducing label requirements by allowing the learning algorithm to perform interactive label queries. The performance of an active learning algorithm is measured by its label complexity, i.e. how many labels are needed to get a classifier with error epsilon. In this talk, we give a computationally efficient, noise-tolerant algorithm for actively learning homogeneous linear separators in R^d. Specifically, if the unlabeled distribution is uniform over the unit sphere, and every label is flipped with probability at most eta < 1/2 (aka Massart's noise condition), our algorithm achieves an optimal label complexity of O( d/(1-2eta)^2 log 1/epsilon ). This is the first algorithm that achieves computational efficiency and label optimality simultaneously in this setting. Our algorithm is based on an extension of the active Perceptron algorithm (Dasgupta, Kalai and Monteleoni, 2005), which may be of independent interest.

This talk is based on our NIPS 2017 paper (joint work with Songbai Yan at UCSD).

Alina Beygelzimer: Practical Agnostic Active Learning

In active learning, a learner processes unlabeled data and can interactively request the label of any unlabeled example. The hope is that by carefully and adaptively choosing what to label, one can substantially reduce the number of labels required to learn an accurate predictor. As the learner creates the data that it in turn learns from, bias issues become problematic. I will discuss computationally efficient techniques for addressing this setting. I will also formalize the common practice of using seed examples and search in active learning, and demonstrate its benefits in situations where the standard active learning framework fails to bring any.

Based on joint work with Sanjoy Dasgupta, Daniel Hsu, John Langford, Chicheng Zhang and Tong Zhang.

Elad Hazan: Taking Control by Convex Optimization

Linear dynamical systems (LDSs), a.k.a. Kalman filtering, are a class of time-series models widely used in robotics, finance, engineering, and meteorology. In its general form (unknown system), learning LDS is a classic non-convex problem, typically tackled with heuristics like gradient descent ("backpropagation through time") or the EM algorithm. I will present our new "spectral filtering" approach to the identification and control of discrete-time general LDSs with multi-dimensional inputs, outputs, and a latent state. This approach yields a simple, efficient, and practical algorithm for low-regret prediction (i.e. asymptotically vanishing MSE).

The talk will cover a series of results, which are joint work with Karan Singh, Cyril Zhang, Sanjeev Arora, Holden Lee, and Yi Zhang.

Samory Kpotufe: Some Recent Insights on Transfer-Learning in Nonparametric Settings

Transfer-Learning aims to use data from a ‘source’ distribution S, towards improving learning performance w.r.t. a target distribution T. One can for instance think of having a large database of images from the web (the source) to be used towards classifying images from a specific domain, say traffic in New York (the target). Much research effort has therefore gone into understanding which relations between T and S allow information-transfer between T and S in such prediction tasks, and more practically, in understanding the relative benefits of source and target data.

In this talk we consider ‘nonparametric’ settings, i.e., we make little assumptions about S and T, and instead aim to characterize (S,T) so as to capture a continuum from easy to hard transfer problems, and furthermore to tightly capture the relative benefits of source and target data.

In the first part of the talk, we will discuss nonparametric insights on importance sampling methods — a very common algorithmic approach to transfer, involving the estimation of density-ratios, and show that these methods can greatly benefit from structured data (such as manifold or sparse data), attesting to their practical success, however might be solving too-hard a problem in some situations when transfer is easy, and are ill-defined in common situations where transfer is hard but still possible at reasonable rates.

In the second part of the talk, I will argue that an asymmetric notion of relative dimension between S and T, tightly captures the minimax rates of transfer. In particular, this notion reveals a rich set of situations where transfer is possible even at fast rates, while traditional measures of divergence between S and T might remain large or undefined. Surprisingly, our minimax analysis shows that unlabeled target data have minor benefits in transfer, while few labeled target data can greatly improve the rates of transfer. Furthermore, we are able to characterize sharp thresholds at which the benefits of source data saturates given the availability of target data. Finally, we show that such a threshold (a priori unknown) can nearly be achieved by an adaptive sampling procedure with no knowledge of distributional parameters.

The talk is partly based on recent work with Guillaume Martinet.

Satyen Kale: Parameter-Free Online Learning via Model Selection

We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, non-nested convex sets, and R^d with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model. These results are all derived through a unified meta-algorithm scheme using a novel ``multi-scale'' algorithm for prediction with expert advice based on random playout, which may be of independent interest.

This is joint work with Dylan J. Foster, Mehryar Mohri and Karthik Sridharan.

Claudio Gentile: Leveraging Structure in Nonstochastic Bandit Problems: Some Examples

We survey recent results in nonstochastic bandit problems where extra structure (or extra information) is available to guide the learning process. We give examples of (intentionally diverse) structured bandit scenarios by presenting mathematical settings, algorithms, and associated regret analyses. Joint with: N. Alon, N. Cesa-Bianchi, P. Gaillard, S. Gerchinovitz, Y. Mansour, A. Minora, S. Mannor, O. Shamir.