Introduction to the Summer School

Introduction to Statistical Learning and Convex Optimization,
Arnak Dalalyan

Abstract: TBA

Main Program

Introduction to machine learning for natural language processing,
Adam Bittlingmayer and Erik Arakelyan

Abstract: Recent progress in machine learning for natural language is significant, however language poses some unique challenges.

In this course we will start with natural language processing fundamentals and the current results of state-of-the-art approaches across tasks. We will focus on deep learning – the motivations behind word vectors and sequence output, and how to apply effective approaches to real tasks with industrial-strength libraries and datasets.

We will also keep in view how natural language tasks relate to tasks in other areas of machine learning.

ShaiBen-David Fundamentals of machine learning theory,
Shai Ben-David

Abstract: I will explain some fundamentals of machine learning theory.
The main focus of my lectures will be on demonstrating how a mathematical analysis of ML tasks and proposed solutions can provide useful insights, performance guarantees and awareness of inherent limitations.

I will argue for the inherent need in prior knowledge for successful learning (no hope for a “general AI” algorithm!).

Most of my lectures will address classification prediction, but I will also discuss clustering and more general unsupervised learning.

Stochastic optimization for machine learning,
Robert M. Gower

Abstract: Modern machine learning heavily relies on optimization tools, typically to minimize the so called loss functions on training sets. The objective of this course short course is to give an introduction to the training problem in machine learning and to present optimization methods that exploit the structure of this problem. The methods we will cover are the proximal, stochastic and plain vanilla gradient descent. If time permits, we may also see modern variance reduced methods. We will cover both the necessary theoretical results of convex optimization as well as the computational exercises through Python notebooks (please bring a laptop).

Exercise 1
Exercise 2
Python Notebook

Sparse estimation and dictionary learning,
Julien Mairal

Abstract: In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection—that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as image processing, neuroscience, bioinformatics, or computer vision. The goal of this course is to offer a short introduction to sparse modeling. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.

Slides

Introduction to Bayesian framework and its application to large-scale machine learning,
Dmitry Vetrov

Abstract: Bayesian framework is a powerful tool for building data processing models. It allows to deal with missing data, account uncertainties, construct latent variable models and regularize machine learning algorithms. It can be treated not just as alternative approach to probability theory but also as a kind of generalization of frequentist approach, Boolean logic and deterministic modelling. During the last 7 years a significant breakthrough has been made in developing scalable tools for approximate Bayesian inference that allow to build Bayesian models on top of modern deep neural networks.

In the talk we will give an introduction to Bayesian paradigm, review its advantages, show how inferences techniques can be scaled. In the end we will consider several interesting applications of Bayesian inference techniques to deep neural networks and show how popular regularization methods can be treated as a particular cases of Bayesian techniques.

Slides