List of Publications

The bibtex entries of my papers are available on this page.

n°28
Oct. 2017
pp.223-237
Keywords:

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the... more

n°2 017
Sep. 2017
Keywords:

This paper is devoted to the study of the max K-armed bandit problem, which consists in sequentially allocating resources in order to detect extreme values. Our contribution is twofold.
We... more

Sep. 2017
Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, in part because of applications including online content optimization. This... more
Mar. 2017
Quel est le point commun entre un site de vente sur Internet, un système de navigation GPS, un essai clinique et un moteur de publicité en ligne ? Petit article de 2 pages pour présenter le domaine... more
Mar. 2017
Recommender systems (RS) aim at recommending either one or several items simultaneously : the former type of RS is named single play RS, while the latter is named multiple-play RS. The... more
n°30
Dec. 2016

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration... more

n°29
Jun. 2016
pp.1 028-1 050

We study an original problem of pure exploration in a strategic bandit model motivated by Monte Carlo Tree Search. It consists in identifying the best action in a game, when the player may sample... more

vol.
4
(1)
Mar. 2016
pp.76-108
Keywords:
In this work, we extend some parameters built on a probability distribution introduced before to the case where the proximity between real numbers is measured by using a Bregman divergence. This... more
Feb. 2016
Keywords:

We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain non-asymptotic, distribution-dependent bounds and provide straightforward proofs based only on well-... more

Feb. 2016
This paper is devoted to the estimation of conditional quantile, more precisely the quantile of the output of a real stochastic code whose inputs are in R d. In this purpose, we introduce a... more
vol.
17
(1)
Jan. 2016
pp.1-42
The stochastic multi-armed bandit model is a simple abstraction that has proven useful in many different contexts in statistics and machine learning. Whereas the achievable limit in terms of regret... more
vol.
51
Oct. 2015
Depuis 1996, les Journées du groupe Modélisation Aléatoire et Statistique (MAS) de la SMAI réunissent tous les deux ans une large part des communautés probabilistes et statistiques françaises. L... more
n°2 015
Jun. 2015
Keywords:

Les systèmes de recommandation à très grande échelle sont aujourd'hui omniprésents sur internet : ouvrages conseillés à l'achat dans les librairies en ligne, articles recommandés sur les... more

n°28
May. 2015
pp.67-72

For several web tasks such as ad placement or e-commerce, recommender systems must recommend multiple items to their users---such problems can be modeled as bandits with multiple plays. State-of-... more

vol.
46
(2)
Mar. 2015
pp.300-319
We describe a new algorithm for the perfect simulation of variable length Markov chains and random systems with perfect connections. This algorithm generalizes Propp and Wilson's simulation... more
n°12
Mar. 2015
pp.73-88

The multiple-play recommender systems (RS) are RS which recommend several items to the users. RS are based on learning models in order to choose the items to recommend. Among these models, the... more

vol.
18
Feb. 2015
pp.59-79
Keywords:
Les systèmes de recommandation (SR) à tirages multiples font référence aux SR recommandant plusieurs objets en même temps aux utilisateurs. La plupart des SR s'appuient sur des modèles d'... more
n°2 014
Jun. 2014
Keywords:

Après obtention de leur DUT, de nombreux étudiants des IUT STID souhaitent poursuivre leurs études pour obtenir un Master qui leur donne la possibilité de travailler comme ingénieur-expert ou chef... more

n°27
Jun. 2014
pp.461-481

A/B testing refers to the task of determining the best option among two alternatives that yield random outcomes. We provide distribution-dependent lower bounds for the performance of A/B testing... more

vol.
19
(3)
Mar. 2014
pp.93-105
Keywords:
The rapid evolution of information systems managing more and more voluminous data has caused profound paradigm shifts in the job of statistician, becoming successively data miner, bioinformatician... more
Sep. 2013
pp.489-493

We present deviation bounds for self-normalized averages and applications to estimation with a random number of observations.
The results rely on a peeling argument in exponential martingale... more

vol.
41
(3)
Jun. 2013
pp.1 516-1 541
We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper... more
vol.
40
(2)
Jun. 2013
pp.344-362
We study a problem of model selection for data produced by two different context tree sources. Motivated by linguistic questions, we consider the case where the probabilistic context trees... more
vol.
14
Feb. 2013
pp.601-623
Keywords:
We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm... more
n°51
Dec. 2012
pp.6 808-6 812
Keywords:

Motivated by issues of security analysis for power systems, we analyze a new problem, called optimal discovery with probabilistic expert advice. We address it with an algorithm based on the... more

n°15
Apr. 2012
pp.592-600

Stochastic bandit problems have been analyzed from two different perspectives: a frequentist view, where the parameter is a deterministic unknown quantity, and a Bayesian approach, where the... more

vol.
21
(6)
Dec. 2011
pp.2 109-2 145
Computing smoothing distributions, the distributions of one or more states conditional on past, present, and future observations is a recurring problem when operating on general hidden Markov models... more
vol.
121
(11)
Nov. 2011
pp.2 488-2 506
Context tree models have been introduced by Rissanen as a parsimonious generalization of Markov models. Since then, they have been widely used in applied probability and statistics. The present paper... more
n°22
Oct. 2011
pp.174-188

Many problems, such as cognitive radio, parameter control of a scanning tunnelling microscope or internet advertisement, can be modelled as non-stationary bandit problems where the distributions... more

n°24
Jul. 2011
pp.359-376

This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary bounded... more

vol.
5
(1)
Feb. 2011
pp.68-76
We consider the task of optimally sensing a two-state Markovian channel with an observation cost and without any prior information regarding the channel's transition probabilities. This task is... more
n°24
Dec. 2010

We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) framework of statistics. For these bandits, we propose a new algorithm, called GLM-UCB. We derive... more

n°48
Sep. 2010
pp.115-122

We consider model-based reinforcement learning in finite Markov Decision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out... more

Jun. 2010
Keywords:

Nous considérons des problèmes de bandits multibras structurés dans lesquels l’agent est guidé par une connaissance a priori sur la structure de la récompense qui peut être exploitée de manière à... more

vol.
57
(11)
Nov. 2009
pp.4 247-4 259
We discuss approximate maximum-likelihood methods for blind identification and deconvolution. These algorithms are based on particle approximation versions of the expectation-maximization (EM)... more
vol.
11
(4)
Oct. 2009
pp.634-642
Keywords:
We show that the maximin average redundancy in pattern coding is eventually larger than 1.84 (n/log n)1/3 for messages of length n. This improves recent results on pattern redundancy, although it... more
n°15
Aug. 2009
pp.465-468
Keywords:

This paper is devoted to extend the regenerative block-bootstrap (RBB) proposed for regenerative Markov chains to Hidden Markov Models {(Xn, Yn)}. In the HMM setup, regeneration times of the... more

vol.
139
(3)
Mar. 2009
pp.962-977
We address the issue of order identification for hidden Markov models with Poisson and Gaussian emissions. We prove information-theoretic BIC-like mixture inequalities in the spirit of Finesso [1991... more
vol.
55
(1)
Jan. 2009
pp.358-373
This paper describes universal lossless coding strategies for compressing sources on countably infinite alphabets. Classes of memoryless sources defined by an envelope condition on the marginal... more
n°9
Jun. 2008
pp.639-643

We discuss approximate maximum likelihood methods for blind identification and deconvolution. These algorithms are based on particle approximation versions of the EM algorithm. We consider two... more

n°9
Jun. 2008
pp.634-638

Particle filtering has been successfully used to approximate the fixed-lag or fixed-interval smoothing distributions in digital communication and to perform approximate maximum likelihood... more

vol.
52
(12)
Dec. 2006
pp.5 579-5 586
The Context Tree Weighting method (CTW) is shown to be almost adaptive on the classes of renewal and Markov renewal processes. Up to logarithmic factor, ctw achieves the minimax pointwise redundancy... more
vol.
52
(10)
Oct. 2006
pp.4 630-4 635
The Bayesian information criterion (BIC) and Krichevsky- Trofimov (KT) version of minimum description length (MDL) principle are popular in the study of model selection. For order estimation of... more
Rapport de DEA (Master's dissertation)
Sep. 2003
Keywords:
Codage universel sans perte par transformée grammaticale.
n°8
Apr. 2002
pp.47-56

Events in self-timed rings can propagate evenly spaced or as bursts. By studying these phenomena, we obtain a better understanding of the underlying dynamics of self-timed pipelines, which is a... more

We introduce a general approach to prove oracle properties in context tree selection. The results derive from a concentration condition that is verified, for example, by mixing processes. Moreover,... more