Probabilistic multileave for online retrieval evaluation

Anne Schuth, Robert-Jan Bruintjes, Fritjof Buüttner, Joost van Doorn, Carla Groenland, Harrie Oosterhuis, Cong-Nguyen Tran, Bas Veeling, Jos van der Velde, Roger Wechsler, David Woudenberg, Maarten de Rijke
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)

PDF DOI

Bibtex

@inproceedings{schuth2015probabilistic, title={Probabilistic multileave for online retrieval evaluation}, author={Schuth, Anne and Bruintjes, Robert-Jan and Bu{"u}ttner, Fritjof and van Doorn, Joost and Groenland, Carla and Oosterhuis, Harrie and Tran, Cong-Nguyen and Veeling, Bas and van der Velde, Jos and Wechsler, Roger and others}, booktitle={Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval}, pages={955--958}, year={2015} }

Abstract

Online evaluation methods for information retrieval use implicit signals such as clicks from users to infer preferences between rankers. A highly sensitive way of inferring these preferences is through interleaved comparisons. Recently, interleaved comparisons methods that allow for simultaneous evaluation of more than two rankers have been introduced. These so-called multileaving methods are even more sensitive than their interleaving counterparts. Probabilistic interleaving–whose main selling point is the potential for reuse of historical data–has no multileaving counterpart yet. We propose probabilistic multileave and empirically show that it is highly sensitive and unbiased. An important implication of this result is that historical interactions with multileaved comparisons can be reused, allowing for ranker comparisons that need much less user interaction data. Furthermore, we show that our method, as opposed to earlier sensitive multileaving methods, scales well when the number of rankers increases.