Complex Feedback in Online Learning

Workshop at the International Conference on Machine Learning (ICML) 2022

This workshop aims to present a broad overview of the feedback types being actively researched, highlight recent advances and provide a networking forum for researchers and practitioners.

While online learning has become one of the most successful and studied approaches in machine learning, in particular with reinforcement learning, online learning algorithms still interact with their environments in a very simple way. The complexity and diversity of the feedback coming from the environment in real applications is often reduced to the observation of a scalar reward. More and more researchers now seek to exploit fully the available feedback to allow faster and more human-like learning.

Online learning, in its broad sense, is the task of continuously learning from feedback gathered about an environment. Reinforcement learning (RL) and bandits are prominent examples which have attracted considerable attention in the past years. Learning online might be a necessity if the environment of the algorithm changes and the behavior to be learned changes with it. It is also a framework which has been used to sequentially learn to act in non-changing settings: learning to act optimally in games can be done by RL, as famously illustrated by AlphaGo.

The standard task abstraction in online learning is the maximization of reward, which is also the feedback to the algorithm: the learner performs an action, observes whether it got a high reward, and improves its behavior based on that feedback. However, this model oversimplifies feedback available in complex real-world applications, where observables beyond the reward abound. Examples include the actions of other players in games. Feedback can further result from the interaction of several past actions, or be delayed. Moreover, the reward might not be observable: the algorithm could learn from indirect signals like preferences instead. The result of an action can be incompletely observed, as in auctions. The algorithm might want to learn from examples or guidance provided by humans.

Invited Speakers

Topics will cover a variety of unconventional feedback encountered in various real world applications ranging from economics to music recommendation tools.

Anca Dragan

Anca is an associate professor in the EECS Department at UC Berkeley. Her goal is to enable robots to work with, around, and in support of people. She runs the InterACT Lab, with the focus on algorithms for human-robot interaction that are aligned with human actual preferences for the robot to do.

Ciara Pike-Burke

Ciara is a lecturer at Imperial College London. Her research interests include multi-armed bandits, online learning, and reinforcement learning. In general, she is interested in sequential decision making under uncertainty and potentially limited feedback.

Nicolò Cesa-Bianchi

Nicolò is professor at the University of Milan. His main research areas are: design and analysis of machine learning algorithms; algorithms for multiarmed bandit problems with applications to personalized recommendations and online auctions; graph analytics with applications to social networks and bioinformatics.

Thorsten Joachims

Thorsten is a Professor at Cornell University. His research intersets include machine learning methods and theory, learning from human behavioral data and implicit feedback, and machine learning for search engines, recommendation, education, and other human-centered tasks

Emilie Kaufmann

Emilie is a CNRS researcher in the CRIStAL at Université de Lille. She is also a member of the Inria team Scool. She is interested in statistics and machine learning, with a particular focus on sequential learning and bandit algorithms.

Julien Pérolat

Julien is a research scientist at DeepMind. His main research interests include game theory and reinforcement learning.

Aleksandrs Slivkins

Alex is a Principal Researcher at MSR New York City. His research interests are in algorithms and theoretical computer science, spanning learning theory, algorithmic economics, and networks. He is particularly interested in online machine learning and exploration-exploitation tradeoff, and their manifestations in socioeconomic environments.


We have planned 7 invited talks of 25 minutes with the goal of providing a broad overview of the theme. We will host two poster sessions showcasing contributed work of 1h30 each. Our review process will further select 6 submissions for a contributed talk of 15 minutes.

Project flip-overs: We consider our workshop a success if it inspires students to embark on new projects. To catalyze germination, we will provide flip-overs during the poster session, around which we will encourage participants to propose, sketch and discuss new starting points, ideas, questions or applications.

Call for paper

Our goal is to reach participants from across machine learning, focusing on the transverse theme of the feedback from which algorithms learn. We accept two forms of contributions: papers containing new, unpublished results, and open problems (in particular new understudied feedback types). Papers should follow the ICML style instructions. Open problem submissions should be up to 4 pages long, excluding references.

Submissions can be both of theoretical and empirical nature, and should focus on the subject of feedback in sequential learning, which includes but is not limited to

  • missing, delayed, partial or otherwise altered reward feedback,
  • structured, richer feedback, in which an action provides complex information on the whole system,
  • learning from preferences, or from examples provided by humans, and more generally from feedback which is only indirectly linked to the performance of the learner,
  • learning from the actions of other players in a game or other multiplayer environment.
The above subjects could arise for example from the fields of online learning, reinforcement learning, bandit theory, sequential games, or any application with interesting feedback in which an agent tries to learn.

Contributions from outside the online learning community will be very welcome, as long as they provide interesting feedback encountered in real-world problems.

Submit here (CMT website):

Submission site opens May 6, 2022
Submission deadline May 27, 2022 - 11:30PM Pacific Time
Decisions announced June 13, 2022
Camera-ready and video submission due July 1, 2022 - 11:30PM Pacific Time
Day of workshop July 23, 2022


Rémy Degenne

Inria, France

Pierre Gaillard

Inria, France

Wouter Koolen

CWI, Netherlands

Aadirupa Saha

Microsoft, USA