Complex Feedback in Online Learning


Workshop at the International Conference on Machine Learning (ICML) 2022

This workshop aims to present a broad overview of the feedback types being actively researched, highlight recent advances and provide a networking forum for researchers and practitioners.

While online learning has become one of the most successful and studied approaches in machine learning, in particular with reinforcement learning, online learning algorithms still interact with their environments in a very simple way. The complexity and diversity of the feedback coming from the environment in real applications is often reduced to the observation of a scalar reward. More and more researchers now seek to exploit fully the available feedback to allow faster and more human-like learning.

Online learning, in its broad sense, is the task of continuously learning from feedback gathered about an environment. Reinforcement learning (RL) and bandits are prominent examples which have attracted considerable attention in the past years. Learning online might be a necessity if the environment of the algorithm changes and the behavior to be learned changes with it. It is also a framework which has been used to sequentially learn to act in non-changing settings: learning to act optimally in games can be done by RL, as famously illustrated by AlphaGo.

The standard task abstraction in online learning is the maximization of reward, which is also the feedback to the algorithm: the learner performs an action, observes whether it got a high reward, and improves its behavior based on that feedback. However, this model oversimplifies feedback available in complex real-world applications, where observables beyond the reward abound. Examples include the actions of other players in games. Feedback can further result from the interaction of several past actions, or be delayed. Moreover, the reward might not be observable: the algorithm could learn from indirect signals like preferences instead. The result of an action can be incompletely observed, as in auctions. The algorithm might want to learn from examples or guidance provided by humans.

Invited Speakers

Topics will cover a variety of unconventional feedback encountered in various real world applications ranging from economics to music recommendation tools.

Andreea Bobu

Andreea is a Ph.D. student at University of California Berkeley, working with Anca Dragan in the InterACT Lab. Her interests lie at the intersection of machine learning, robotics, and human-robot interaction, with a focus on robot learning with uncertainty.



Ciara Pike-Burke

Ciara is a lecturer at Imperial College London. Her research interests include multi-armed bandits, online learning, and reinforcement learning. In general, she is interested in sequential decision making under uncertainty and potentially limited feedback.



Nicolò Cesa-Bianchi

Nicolò is professor at the University of Milan. His main research areas are: design and analysis of machine learning algorithms; algorithms for multiarmed bandit problems with applications to personalized recommendations and online auctions; graph analytics with applications to social networks and bioinformatics.

Thorsten Joachims

Thorsten is a Professor at Cornell University. His research intersets include machine learning methods and theory, learning from human behavioral data and implicit feedback, and machine learning for search engines, recommendation, education, and other human-centered tasks


Vianney Perchet

Vianney is a professor at the Centre de recherche en économie et statistique (CREST) at the ENSAE since october 2019. Mainly focusing on the interplay between machine learning and game theory, his themes of research are at the junction of mathematics, computer science and economics. He is also part-time principal researcher in the Criteo AI Lab, in Paris, working on efficient exploration in recommender systems.

Julien Pérolat

Julien is a research scientist at DeepMind. His main research interests include game theory and reinforcement learning.









Aleksandrs Slivkins

Alex is a Principal Researcher at MSR New York City. His research interests are in algorithms and theoretical computer science, spanning learning theory, algorithmic economics, and networks. He is particularly interested in online machine learning and exploration-exploitation tradeoff, and their manifestations in socioeconomic environments.

Schedule

The full schedule is available on the icml conference website: Schedule .

In addition to the 7 presentations by our invited speakers, we selected 6 contributions for a talk of 15 minutes. These contributions are

  • ActiveHedge: Hedge meets Active Learning,
    Bhuvesh Kumar · Jacob Abernethy · Venkatesh Saligrama
  • Near-optimal Regret for Adversarial MDP with Delayed Bandit Feedback,
    Tiancheng Jin · Tal Lancewicki · Haipeng Luo · Yishay Mansour · Aviv Rosenberg
  • Contextual Inverse Optimization: Offline and Online Learning,
    Omar Besbes · Yuri Fonseca · Ilan Lobel
  • Giving Complex Feedback in Online Student Learning with Meta-Exploration,
    Evan Liu · Moritz Stephan · Allen Nie · Chris Piech · Emma Brunskill · Chelsea Finn
  • Threshold Bandit Problem with Link Assumption between Pulls and Duels,
    Keshav Narayan · Aarti Singh
  • Meta-learning from Learning Curves Challenge: Lessons learned from the First Round and Design of the Second Round,
    Manh Hung Nguyen · Lisheng Sun · Nathan Grinsztajn · Isabelle Guyon

All contributed works will be showcased during the poster session, from 3:00 PM to 4:30 PM (local time).

Call for paper

Our goal is to reach participants from across machine learning, focusing on the transverse theme of the feedback from which algorithms learn. We accept two forms of contributions:

  • Papers containing new, unpublished results (papers accepted to ICML 2022 are eligible). Paper should follow the ICML style instructions (up to 8 pages, followed by unlimited pages for references and an appendix, all in a single file)
  • Open problems, in particular new understudied feedback types. Open problem submissions should be up to 4 pages long, excluding references.

Submissions can be both of theoretical and empirical nature, and should focus on the subject of feedback in sequential learning, which includes but is not limited to

  • missing, delayed, partial or otherwise altered reward feedback,
  • structured, richer feedback, in which an action provides complex information on the whole system,
  • learning from preferences, or from examples provided by humans, and more generally from feedback which is only indirectly linked to the performance of the learner,
  • learning from the actions of other players in a game or other multiplayer environment.
The above subjects could arise for example from the fields of online learning, reinforcement learning, bandit theory, sequential games, or any application with interesting feedback in which an agent tries to learn.

Contributions from outside the online learning community will be very welcome, as long as they provide interesting feedback encountered in real-world problems.

Submit here (CMT website): https://cmt3.research.microsoft.com/CFOL2022/

Submission site opens May 6, 2022
Submission deadline May 27, 2022 - 11:30PM Pacific Time
Decisions announced June 13, 2022
Video submission due July 1, 2022 - 11:30PM Pacific Time
Day of workshop July 23, 2022

Organizers

Rémy Degenne

Inria, France

Pierre Gaillard

Inria, France

Wouter Koolen

CWI, Netherlands

Aadirupa Saha

Microsoft, USA