Comparing Comparisons: Informative and Easy Human Feedback with Distinguishability Queries

Xuening Feng, Zhaohui Jiang, Timo KaufmannEyke HüllermeierPaul Weng, and Yifei Zhu
In ICML 2024 Workshop on Models of Human Feedback for AI Alignment (MHFAIA), 2024

Abstract

Learning human objectives from preference feedback has significantly advanced reinforcement learning (RL) in domains with hard-to-formalize objectives. Traditional methods with pairwise trajectory comparisons face challenges: trajectories with subtle differences are hard to compare, and comparisons are ordinal, limiting direct inference of preference strength. In this paper, we introduce the distinguishability query, where humans compare two pairs of trajectories and indicate which pair is easier to compare and then give preference feedback on the easier pair. This type of query directly infers preference strength and is expected to reduce cognitive load on the labeler. We also connect this query to cardinal utility and difference relations, and develop an efficient query selection scheme to achieve better trade-off between query informativeness and easiness. Experimental results empirically demonstrates the potential of our method for faster, data-efficient learning and improved user-friendliness on RLHF benchmarks.

Cite

@inproceedings{feng2024comparing,
  slug = {comp-comp-workshop},
  title = {Comparing {{Comparisons}}: {{Informative}} and {{Easy Human Feedback}} with {{Distinguishability Queries}}},
  shorttitle = {Comparing {{Comparisons}}},
  booktitle = {{{ICML}} 2024 {{Workshop}} on {{Models}} of {{Human Feedback}} for {{AI Alignment}} ({{MHFAIA}})},
  author = {Feng, Xuening and Jiang, Zhaohui and Kaufmann, Timo and H{\"u}llermeier, Eyke and Weng, Paul and Zhu, Yifei},
  year = {2024}
}