Timo Kaufmann

I’m a PhD student in the AI+ML group at LMU Munich, advised by Eyke Hüllermeier. My research is focused on developing novel techniques that combine reinforcement learning and preference learning to ensure that agents solve the right problems efficiently and robustly. Specifically, I’m interested in the intersection of these two fields and how they can be used to help agents learn and quickly learn the right behaviors in novel environments. Through my work, I hope to contribute to the advancement of AI and its ability to solve complex problems in the real world.

News

Sep 14, 2023	I will present our workshop paper on the challenges and practices of reinforcement learning from real human feedback on the HLDM’23 ECML workshop on Friday next week (September 22nd)!
Apr 4, 2023	I am organizing an invited session on Reinforcement Learning from Human Feedback (RLHF) at ECDA 2023.
Mar 28, 2023	I presented our short-paper at the ML4CPS conference. In the paper we discuss the potential of leveraging self-supervised pretraining for reinforcement learning from human feedback in the context of cyber-physical systems.
Mar 15, 2022	I started my PhD at Eyke Hüllermeier’s AI+ML group at LMU Munich!

Selected Publications

arXiv
A Survey of Reinforcement Learning from Human Feedback

Timo Kaufmann, Paul Weng, Viktor Bengs, and 1 more author

2023

Abs Bib PDF arXiv

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of large language models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in targeting the model’s capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between machine agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.
@misc{kaufmann2023survey, title = {A {{Survey}} of {{Reinforcement Learning}} from {{Human Feedback}}}, author = {Kaufmann, Timo and Weng, Paul and Bengs, Viktor and H{\"u}llermeier, Eyke}, year = {2023}, eprint = {2312.14925}, archiveprefix = {arXiv}, }
ICLR
Inverse Constitutional AI: Compressing Preferences into Principles

Arduin Findeis, Timo Kaufmann, Eyke Hüllermeier, and 2 more authors

In Proceedings of the International Conference on Learning Representations (ICLR), 2025

Abs Bib PDF

Feedback data plays an important role in fine-tuning and evaluating state-of-the-art AI models. Often pairwise text preferences are used: given two texts, human (or AI) annotators select the "better" one. Such feedback data is widely used to align models to human preferences (e.g., reinforcement learning from human feedback), or to rank models according to human preferences (e.g., Chatbot Arena). Despite its wide-spread use, prior work has demonstrated that human-annotated pairwise text preference data often exhibits unintended biases. For example, human annotators have been shown to prefer assertive over truthful texts in certain contexts. Models trained or evaluated on this data may implicitly encode these biases in a manner hard to identify. In this paper, we formulate the interpretation of existing pairwise text preference data as a compression task: the Inverse Constitutional AI (ICAI) problem. In constitutional AI, a set of principles (or constitution) is used to provide feedback and fine-tune AI models. The ICAI problem inverts this process: given a dataset of feedback, we aim to extract a constitution that best enables a large language model (LLM) to reconstruct the original annotations. We propose a corresponding initial ICAI algorithm and validate its generated constitutions quantitatively based on reconstructed annotations. Generated constitutions have many potential use-cases – they may help identify undesirable biases, scale feedback to unseen data or assist with adapting LLMs to individual user preferences. We demonstrate our approach on a variety of datasets: (a) synthetic feedback datasets with known underlying principles; (b) the AlpacaEval dataset of cross-annotated human feedback; and (c) the crowdsourced Chatbot Arena data set. We release the code for our algorithm and experiments at https://github.com/rdnfn/icai.
@inproceedings{findeis2025inverse, title = {Inverse {{Constitutional AI}}: {{Compressing Preferences}} into {{Principles}}}, booktitle = {Proceedings of the International Conference on Learning Representations ({{ICLR}})}, shorttitle = {Inverse {{Constitutional AI}}}, author = {Findeis, Arduin and Kaufmann, Timo and H{\"u}llermeier, Eyke and Albanie, Samuel and Mullins, Robert}, year = {2025} }

AAAI

DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback

Xuening Feng, Zhaohui Jiang, Timo Kaufmann, and 4 more authors

In Proceedings of the AAAI Conference on Artificial Intelligence, 2025

Bib PDF

@inproceedings{feng2025duo,
  author = {Feng, Xuening and Jiang, Zhaohui and Kaufmann, Timo and Xu, Puchen and Hüllermeier, Eyke and Weng, Paul and Zhu, Yifei},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  title = {DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback},
  year = {2025},
}

MHFAIA
Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries

Xuening Feng, Zhaohui Jiang, Timo Kaufmann, and 3 more authors

In ICML 2024 Workshop on Models of Human Feedback for AI Alignment (MHFAIA), 2024

Abs Bib PDF 🔗

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains with hard-to-formalize objectives. Traditional methods with pairwise trajectory comparisons face challenges: trajectories with subtle differences are hard to compare, and comparisons are ordinal, limiting direct inference of preference strength. In this paper, we introduce the distinguishability query, where humans compare two pairs of trajectories and indicate which pair is easier to compare and then give preference feedback on the easier pair. This type of query directly infers preference strength and is expected to reduce cognitive load on the labeler. We also connect this query to cardinal utility and difference relations, and develop an efficient query selection scheme to achieve better trade-off between query informativeness and easiness. Experimental results empirically demonstrates the potential of our method for faster, data-efficient learning and improved user-friendliness on RLHF benchmarks.
@inproceedings{feng2024comparing, title = {Comparing {{Comparisons}}: {{Informative}} and {{Easy Human Feedback}} with {{Distinguishability Queries}}}, shorttitle = {Comparing {{Comparisons}}}, booktitle = {{{ICML}} 2024 {{Workshop}} on {{Models}} of {{Human Feedback}} for {{AI Alignment}} ({{MHFAIA}})}, author = {Feng, Xuening and Jiang, Zhaohui and Kaufmann, Timo and H{\"u}llermeier, Eyke and Weng, Paul and Zhu, Yifei}, year = {2024} }
RLBRew
OCALM: Object-Centric Assessment with Language Models

Timo Kaufmann, Jannis Blüml, Antonia Wüst, and 3 more authors

In RLC 2024 Workshop on Reinforcement Learning Beyond Rewards (RLBRew), 2024

Abs Bib PDF arXiv 🔗

Properly defining a reward signal to efficiently train a reinforcement learning (RL) agent is a challenging task. Designing balanced objective functions from which a desired behavior can emerge requires expert knowledge, especially for complex environments. Learning rewards from human feedback or using large language models (LLMs) to directly provide rewards are promising alternatives, allowing non-experts to specify goals for the agent. However, black-box reward models make it difficult to debug the reward. In this work, we propose Object-Centric Assessment with Language Models (OCALM) to derive inherently interpretable reward functions for RL agents from natural language task descriptions. OCALM uses the extensive world-knowledge of LLMs while leveraging the object-centric nature common to many environments to derive reward functions focused on relational concepts, providing RL agents with the ability to derive policies from task descriptions.
@inproceedings{kaufmann2024ocalm, title = {{{OCALM}}: {{Object-Centric Assessment}} with {{Language Models}}}, booktitle = {{{RLC}} 2024 {{Workshop}} on {{Reinforcement Learning Beyond Rewards}} ({{RLBRew}})}, author = {Kaufmann, Timo and Bl{\"u}ml, Jannis and W{\"u}st, Antonia and Delfosse, Quentin and Kersting, Kristian and H{\"u}llermeier, Eyke}, year = {2024} }
HLDM
On the Challenges and Practices of Reinforcement Learning from Real Human Feedback

Timo Kaufmann, Sarah Ball, Jacob Beck, and 2 more authors

In Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023

Abs Bib PDF 🪧 Slides Vid 🔗

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that does not require an engineered reward function but instead learns from human feedback. Due to its increasing popularity, various authors have studied how to learn an accurate reward model from only few samples, making optimal use of this feedback. Because of the cost and complexity of user studies, however, this research is often conducted with synthetic human feedback. Such feedback can be generated by evaluating behavior based on ground-truth rewards which are available for some benchmark tasks. While this setting can help evaluate some aspects of RLHF, it differs from practical settings in which synthetic feedback is not available. Working with real human feedback brings additional challenges that cannot be observed with synthetic feedback, including fatigue, inter-rater inconsistencies, delay, misunderstandings, and modality-dependent difficulty. We describe and discuss some of these challenges together with current practices and opportunities for further research in this paper.
@inproceedings{kaufmann2023challenges, title = {On the~{{Challenges}} and~{{Practices}} of~{{Reinforcement Learning}} from~{{Real Human Feedback}}}, booktitle = {Machine {{Learning}} and {{Principles}} and {{Practice}} of {{Knowledge Discovery}} in {{Databases}}}, author = {Kaufmann, Timo and Ball, Sarah and Beck, Jacob and H{\"u}llermeier, Eyke and Kreuter, Frauke}, pages = {276--294}, editor = {Meo, Rosa and Silvestri, Fabrizio}, publisher = {Springer Nature Switzerland}, doi = {10.1007/978-3-031-74627-7_21}, year = {2023} }

Show all