## Séminaire de Statistique

published on , updated on

Organisateurs : Mélisande Albert, Dominique Bontemps, Pierre Neuvial

Jour et lieu habituels : le mardi à 11h15 en salle 106 (bâtiment 1R1).

• ### Tuesday 6 November 2018 11:00-12:00 - Peter D. Grünwald - CWI, Amsterdam

A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity

Résumé : We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, KL(posterior ∥ prior) complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to L2(P) entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with L∞. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: ’easiness’ (Bernstein) conditions and model complexity.

Lieu : Salle 106, Bat 1R1

• ### Tuesday 20 November 2018 11:00-12:00 - Franziska Göbel - University of Potsdam

On Graph Wavelets

Résumé : In this talk I will present a multiscale approach to construct a data-adapted basis-like (Parseval frame) set of functions F which allows for a decomposition of every square-integrable function defined on the vertices of a finite undirected weighted graph. To this end, we follow the idea of Coulhon et al. (2012) for constructing localized frames on relatively general spaces.
We have a look at the properties of F and at its application in the denoising setup which Is based on the property of being a Parseval frame. Given noisy values of an unknown function y_i=f(x_i)+epsilon_i at a finite number of points x_i we want to recover f at these points. Based on a neighborhood graph representation of the points x_i we derive an estimate of f using the frame decomposition and a thresholding method to the coefficients.
If time allows we furthermore show that the considered random neighborhood graphs satisfy with high probability a doubling volume condition as well as a local Poincaré inequality under some assumptions on the underlying space and the sampling. These two properties are essentially for the spatial localization of the frame elements in the setting of Coulhon et al. (2012).
This talk is based on joint work with Gilles Blanchard and Ulrike von Luxburg.

Lieu : Salle 106, Bat 1R1

• ### Tuesday 27 November 2018 11:00-12:00 - Carin Ludeña - Universidad Central de Venezuela

Modular decomposition and random graph models

Résumé : Decompositions of finite structures have been studied in different branches of discrete mathematics for over 40 years. In 1967, T. Gallai gave a prime decomposition theorem for simple graphs which has given rise to a robust literature on the subject, namely related to characterization of certain graph properties and optimization algorithms over graphs. However, there has apparently not been much connection between graph decompositions and applications in probability and statistics. In this talk we will give a brief overview of the general theory, provide some interesting insight on what graph decomposition looks like for common random graph models and discuss a graph model based on this notion.

Lieu : Salle 106, Bat 1R1

• ### Tuesday 4 December 2018 11:00-12:00 - Mathieu Ribatet - Université de Montpellier

Autour de la structure des processus max-stables

Résumé : Les processus max-stables jouent un rôle fondamental dans la modélisation spatiale des événements rares, e.g., inondations, vagues de chaleur… Dans cet exposé nous allons repartir de zéro en nous intéressant à leurs représentations spectrales ; représentation qui n’est rien de plus qu’une construction probabiliste simple de cette classe de processus. Par la suite, nous nous intéresserons à la structure particulière induite par cette représentation spectrale, ce qui nous permettra de parler de la difficulté de simuler (conditionnellement) ces processus, d’introduire des mesures (spatiales) de dépendance adaptées mais aussi d’inférence.

Lieu : Salle 106, Bat 1R1

• ### Tuesday 11 December 2018 11:00-12:00 - Olivier Roustant - Mines Saint-Etienne

Sobol-Hoeffding decomposition: bounds and extremes.

Résumé : This talk presents applications of the Sobol-Hoeffding decomposition at the interplay between sensitivity analysis, functional inequalities and extreme theory.
The Sobol-Hoeffding decomposition expands a multivariate (square integrable) function f(x1, …, xd) as a sum of orthogonal terms with increasing complexity. It gives a variance decomposition (ANOVA), which is useful to quantify the influence of the inputs, as well as their interactions. In particular, the variance component corresponding to a set of variables and all its supersets, called "superset importance", brings a very useful information. Indeed, superset importance for single elements ("total effects") are helpful to screen out unessential variables among x1, …, xd. Superset importance for pairs ("total interactions") are helpful to screen out unessential second-order interactions, and discover additive structures in f.
In the first part of the talk, we focus on computational shortcuts for approximating superset importance, using the gradient of f. We present a synthesis of results for upper bounds, based on Poincaré inequalities. We also show new results for lower bounds, based on geometry.
In the second part, we focus on a particular multivariate function, the "stable tail dependence function" (s.t.d.f), which contains the dependence information for extreme values. In particular, asymptotic independence is related to additivity in the s.t.d.f. Superset importance for pairs can then be used to define a visualization tool ("tail dependograph") for extremal dependence analysis. In addition to illustrations, we give computational details and inference results.

Lieu : Salle 106, Bat 1R1

• ### Tuesday 18 December 2018 11:00-12:00 - Vincent Feuillard - Airbus

A multivariate extreme value theory approach to anomaly clustering and visualization

Lieu : Salle 106, Bat 1R1 - Motivated by a wide variety of applications from fraud detection to aviation safety management, unsupervised anomaly detection is the subject of much attention in the machine-learning literature. We developed novel statistical techniques for tackling anomaly detection borrowing concepts and tools from machine-learning and multivariate extreme value analysis both at the same time. Usually, anomaly detection algorithms declared extremes as anomaly, whereas all extremes values are not anomalies. We study the dependance structure of rare events in the context of high dimensional and propose an algorithm to detect this structure under a sparse assumption. This approach can reduce drastically the false alarm rate : anomalies then correspond to the observation of simultaneous very large/extreme values for groups of variables that have not been identified yet. A data-driven methodology for learning the sparse representation of extreme behaviours has been developed in Goix (2016). An advantage of this method lies in its straightforward interpretability. In addition, the representation of the dependance structure in the extremes thus designed induces a specific notion of (dis-)similarity among anomalies, that paves the way for elaborating visualization tools for operators in the spirit of those proposed for large graphs. We also describe how this approach applies to functional data collected for aircraft safety purposes after an appropriate preliminary filtering stage.

• ### Tuesday 8 January 11:15-12:15 - Gérard Letac - IMT

Les quantiles d’une famille exponentielle

Résumé : Soit $P$ une probabilité sur $R$ et $P_t (dx)=e^{xt} P(dx)/L(t)$. Il est facile de voir que $t$ est la moyenne de $P_t$ pour tout $t$ si et seulement si $P$ est gaussienne. C'est beaucoup moins aisé si on remplace le mot moyenne par le mot médiane, voire le mot quantile. Nous traitons aussi le cas analogue des lois gamma (voir ArXiv 1810-11917). Ceci utilise le résultat de Choquet Deny de 1960 qui dit que si $H$ est une densité de probabilité et si $f$ est positive alors $f=f*H$ si et seulement si $f$ est barycentre des $x\mapsto e^{xt}$ tels que $\int e^{xt}H(x)dx$. En collaboration avec Mauro Piccioni et Bartosz Kolodziejek.

Lieu : Salle 106, Bat 1R1

• ### Tuesday 15 January 11:15-12:15 - Nicolas Bousquet - Sorbonne Université

Une vision des problèmes d’inversion stochastique bien posés par le biais de l’analyse de sensibilité

Résumé : Dans les problèmes d’inversion stochastique, on veut estimer une distribution de probabilité à partir d’informations indirectes observables et bruitées et des connaissances limitées d’un opérateur qui relie cette distribution à des observés. Bien que de tels problèmes soient caractérisés par de fortes conditions d’identifiabilité, des conditions de " signal sur bruit " bien définies sont prioritaires et doivent être respectées pour la collecte des observations. En plus de la condition de Hadamard, une nouvelle condition est proposée, fondée sur la transmission de l’incertitude de l’entrée à la sortie de l’opérateur, qui peut être interprétée comme le résultat fourni par une analyse de sensibilité si le problème était résolu. Cette nouvelle condition devrait être intégrée au modèle d’entrée lui-même, ce qui ajoute une contrainte dans les approches fréquentistes ou bayésiennes d’inversion stochastique. Bien qu’on traite principalement des opérateurs linéaires ou linéarisables, l’absence de contraste typique des problèmes linéaires suggère que la condition proposée devrait être utilisée dans des contextes plus généraux.

Lieu : Salle 106, Bat 1R1

• ### Tuesday 22 January 09:15-10:45 - Marc Hallin (Séminaire commun proba-stats) - ECARES et Département de Mathématique Université libre de Bruxelles

Center-Outward Distribution Functions, Quantiles, Ranks, and Signs in R^d : A Measure Transportation Approach

Résumé : Unlike the real line, the $d$-dimensional space $R^d$, for $d \geq 2$, is not canonically ordered. As a consequence, such fundamental and strongly order-related univariate concepts as quantile and distribution functions, and their empirical counterparts, involving ranks and signs, do not canonically extend to the multivariate context. Palliating that lack of a canonical ordering has remained an open problem for more than half a century, and has generated an abundant literature, motivating, among others, the development of statistical depth and copula-based methods. We show that, unlike the many definitions that have been proposed in the literature, the measure transportation-based ones introduced in Chernozhukov, Galichon, Hallin and Henry (2017) enjoy all the properties (distribution-freeness and the maximal invariance property that entails preservation of semiparametric efficiency) that make univariate quantiles and ranks successful tools for semiparametric statistical inference. We therefore propose a new \it center-outward definition of multivariate distribution and quantile functions, along with their empirical counterparts, for which we establish a Glivenko-Cantelli result---the quintessential property of all distribution functions. Our approach, based on results by McCann (1995), is geometric rather than analytical and, contrary to the Monge-Kantorovich one in Chernozhukov et al. (2017) (which assumes compact supports, hence finite moments of all orders), does not require any moment assumptions. The resulting ranks and signs are shown to be strictly distribution-free, and maximal invariant under the action of a data-driven class of (order-preserving) transformations generating the family of absolutely continuous distributions; that maximal invariance, in view of a general result by Hallin and Werker (2003), is the theoretical foundation of the semiparametric efficiency preservation property of ranks. The corresponding quantiles are equivariant under the same transformations.

Lieu : Amphi Schwartz

• 1 | 2 | 3 | 4 | 5 | 6

iCal