From classical inference...
... to post hoc inference

Rationale of the project

The number and size of available data sets of different types has increased dramatically over the past twenty years. This “data deluge” has been accompanied by a shift from hypothesis-driven research to data-driven research in many scientific fields including astronomy, biology, genetics, or medicine. Analyzing and interpreting such data require innovative approaches for the simultaneous testing of a large number of biological hypotheses.

This project gathers specialists of multiple testing theory, high-dimensional data analysis, and genomics. It aims at filling a gap between the statistical guarantees provided by state-of-the-art multiple testing procedures and the actual needs of practitioners.

We propose to develop “post hoc” procedures (in the sense of Goeman and Solari, Statistical Science, 2011), which provide confidence statements on the number or proportion of false positives among any subset of hypotheses chosen by the user after analyzing the data. Both theoretical and applied aspects of post hoc multiple testing will be covered.

Main events

  • Jun 15-19, 2020: Participation to the scientific committee of the Mathematical Methods of Modern Statistics 2 conference at CIRM (Luminy, France). This conference has been virtualized.

  • Mar 10-12, 2020: ANR meeting, Paris. With G. Blanchard, M. Perrot-Dockès, P. Neuvial, E. Roquain.

  • Dec 12-15, 2019: Participation of M. Perrot-Dockès, P. Neuvial, E. Roquain and F. Villers at MCP 2019 in Taiwan. Organization of a session on post-selection inference and multiple testing.

  • Apr 8, 2019: ANR meeting, Paris. With G. Blanchard, G. Durand, M. Perrot-Dockès, P. Neuvial, G. Rigaill, E. Roquain, B. Sadacca.

  • Feb 7-9, 2018: Workshop Post-selection inference and multiple testing in Toulouse. This event is part of a thematic semester Mathematics and Computer Science for biology organized by CIMI, the International Centre for Mathematics and Computer Science in Toulouse.

  • January 6, 2017: Kick-off meeting, Evry.

Preprints

  1. Abraham K, Castillo I, Roquain E: Empirical Bayes cumulative $\backslashell $-value multiple testing procedure for sparse sequences, 2021 [arXiv] [pdf] [bib]
  2. Roquain E, Verzelen N: False discovery rate control with unknown null distribution: is it possible to mimic the oracle?, 2019 [arXiv] [pdf] [bib]
  3. Döhler S, Roquain E: Controlling false discovery exceedance for heterogeneous tests, 2019 [arXiv] [pdf] [bib]
  4. Rebafka T, Roquain E, Villers F: Graph inference with clustering and false discovery rate control, 2019 [arXiv] [pdf] [bib]
  5. Blanchard G, Neuvial P, Roquain E: On agnostic post hoc approaches to false positive control, 2019 Submitted book chapter.
    [hal] [pdf] [url] [bib]
  6. Durand G, Junge F, Döhler S, Roquain E: DiscreteFDR: An R package for controlling the false discovery rate for discrete test statistics arXiv preprint, 2019 [arXiv] [pdf] [bib]

Papers

  1. [SJS] Durand G, Blanchard G, Neuvial P, Roquain E: Post hoc false positive control for structured hypotheses Scandinavian Journal of Statistics, to appear preprint.
    [hal] [arXiv] [pdf] [url] [bib]
  2. [AoS] Carpentier A, Delattre S, Roquain E, Verzelen N: Estimating minimum effect with outlier selection The Annals of Statistics 49: 272–294, 2021 [hal] [pdf] [bib]
  3. [AoS] Blanchard G, Neuvial P, Roquain E: Post Hoc Confidence Bounds on False Positives Using Reference Families Annals of Statistics 48: 1281–1303, 2020 [hal] [arXiv] [pdf] [url] [bib]
  4. [AoS] Castillo I, Roquain E: On spike and slab empirical Bayes multiple testing Annals of Statistics, 2020 [arXiv] [pdf] [url] [bib]
  5. [EJS] Durand G: Adaptive p-value weighting with power optimality Electronic Journal of Statistics 13: 3336–3385, 2019 [hal] [arXiv] [pdf] [bib]
  6. [EJS] Bachoc F, Blanchard G, Neuvial P: On the post selection inference constant under restricted isometry properties Electron J Statist 12: 3736–3757, 2018 [hal] [arXiv] [pdf] [url] [bib]
  7. [Bio-k] Picard F, Reynaud-Bouret P, Roquain E: Continuous testing for Poisson process intensities: a new perspective on scanning statistics Biometrika 105: 931–944, 2018 [arXiv] [pdf] [bib]
  8. [EJS] Döhler S, Durand G, Roquain E: New FDR bounds for discrete and heterogeneous tests Electron J Statist 12: 1867–1900, 2018 [pdf] [url] [bib]

Participants

Toulouse  
Mélisande Albert INSA, Institut de Mathématiques de Toulouse
François Bachoc Université Paul Sabatier, Institut de Mathématiques de Toulouse
Maria Martinez INSERM UMR 1043
Pierre Neuvial CNRS, Institut de Mathématiques de Toulouse


Evry  
Cyril Dalmasso Université d’Evry, Laboratoire de Mathématiques et Modélisation d’Evry
Jean-François Deleuze Centre National de Génotypage
Edith Le Floch Centre National de Génotypage
Guillem Rigaill INRA, Laboratoire de Mathématiques et Modélisation d’Evry
Franck Samson INRA, Laboratoire de Mathématiques et Modélisation d’Evry


Paris  
Sylvain Delattre Université Paris 7, Laboratoire de Probabilités et Modèles Aléatoires
Marie Perrot-Dockès Laboratoire de Probabilités, Statistique et Modélisation
Etienne Roquain Université Paris 6, Laboratoire de Probabilités, Statistique et Modélisation


Orsay  
Gilles Blanchard Universität Potsdam, Institut für Mathematik


Alumni

Guillermo Durand Université Paris 6, Laboratoire de Probabilités, Statistique et Modélisation
Benjamin Sadacca Institut Curie, Immune responses to cancer

Open source software

  • The R package sansSouci implements most of the methods developed in the course of the project.

  • The IIDEA Shiny application implements interactive differential analyses (volcano plots and set enrichment analyses)

  • The R package discreteFDR implements the procedures adapted to discrete tests, as described in Döhler et al (2018) 1 and Durand et al (2019) 2.

Funding

Funded by ANR CNRS Labex CIMI

‘SansSouci’ has been identified as one of the best acronyms for ANR projects in 2016 by the Agence Nationale de l’Excellence Scientitique (ANES). See the official announcement on twitter.