Research

I am a Full Time, Machine Learning Researcher at Criteo AI Lab working on improving large scale, interactive systems. Prior to that, I conducted my PhD at Institut Polytechnique de Paris (Attached to CREST) and Criteo AI Lab, under the supervision of Nicolas Chopin and David Rohde. I also hold a M.Eng. degree in Applied Mathematics from CentraleSupelec as well as the MVA M.Sc. degree from ENS Paris-Saclay.

I do research in Large Scale Interactive Systems, with a strong Statistical Learning Theory component. More specifically, my work revolves around understanding the offline formulation of RL/Bandits to improve Large Scale Interactive Systems. Lately, I’m focusing on improving the RLHF pipeline. More details about my work can be found below.

Publications

For an exhaustive list of my publications, you can check my Google Scholar page.

Research Papers

(Offline RL) Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning — Otmane Sakhi, Imad Aouali, Nicolas Chopin, Pierre Alquier – NeuRIPS ‘24 (Spotlight), CONSEQUENCES @ RecSys ‘24 (Oral).
(Large Scale) Fast Slate Policy Optimization : Going Beyond Plackett-Luce — Otmane Sakhi, David Rohde, Nicolas Chopin – TMLR December ‘23.
(Offline RL) PAC-Bayesian Offline Contextual Bandits With Guarantees — Otmane Sakhi, Nicolas Chopin, Pierre Alquier – ICML ‘23 : 40th International Conference on Machine Learning.
(Large Scale) Probabilistic Rank and Reward: A Scalable Model for Slate Recommendation — Imad Aouali, Achraf Ait Sidi Hammou, Sergey Ivanov, Otmane Sakhi, David Rohde, Flavian Vasile – Preprint.
(Large Scale) Fast Offline Policy Optimization for Large Scale Recommendation — Otmane Sakhi, David Rohde, Alexandre Gilotte – Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI-23).
(Offline RL) Improving Offline Contextual Bandits with Distributional Robustness — Otmane Sakhi, Louis Faury, Flavian Vasile – RecSys Workshop on Reinforcement Learning and Robust Estimators for Recommendation Systems (REVEAL ‘20).
(Large Scale) BLOB: A Probabilistic Model for Recommendation that Combines Organic and Bandit Signals — Otmane Sakhi, Stephan Bonner, David Rohde, Flavian Vasile – KDD ‘20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
(Bayesian DL) Reconsidering Analytical Variational Bounds for Output Layers of Deep Networks — Otmane Sakhi, Stephan Bonner, David Rohde, Flavian Vasile – 4th workshop on Bayesian Deep Learning (NeurIPS 2019), Vancouver, Canada.

PhD Thesis

Offline Contextual Bandit: Theory and Large Scale Applications — Otmane Sakhi, under the supervision of Nicolas Chopin and David Rohde. 2023, Institut Polytechnique de Paris.

Tutorials

Reward Optimizing Recommendation using Deep Learning and Fast Maximum Inner Product Search — Imad Aouali, Amine Benhalloum, Martin Bompaire, Achraf Ait Sidi Hammou, Sergey Ivanov, Benjamin Heymann, David Rohde, Otmane Sakhi, Flavian Vasile, Maxime Vono – KDD ‘22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. [Github Repo]
Recommender Systems through the lens of Decision Theory — Flavian Vasile, David Rohde, Olivier Jeunen, Amine Benhalloum and Otmane Sakhi – WWW ‘21: Companion Proceedings of the Web Conference 2021. [Website]
Bayesian Value Based Recommendation: A modelling based alternative to proxy and counterfactual policy based recommendation — David Rohde, Flavian Vasile, Sergey Ivanov, Otmane Sakhi – RecSys ‘20: Proceedings of the 14th ACM Conference on Recommender Systems. [Github Repo]

Contact me

If you have any questions about my work, do not hesitate to send me an email, I will be happy to discuss.

My email address.