View online : Accès au bâtiment 1R1/Coming to building 1R1
Home > Scientific Events > Seminars > Séminaires > Séminaire de Statistique
published on , updated on
Organisateurs : Mélisande Albert, Dominique Bontemps, Pierre Neuvial
Jour et lieu habituels : le mardi à 11h15 en salle 106 (bâtiment 1R1).
View online : Accès au bâtiment 1R1/Coming to building 1R1
Résumé : In this talk we study a general class of asymmetric distributions. Their probabilistic properties lead to explicit expressions for all main characteristics (mean, variance, skewness, kurtosis, …). Estimation of the parameters via method of moments and the maximum likelihood method is discussed, and the asymptotic behaviour of the estimators is established, again in the general framework. The emphasis in the inference is on quantile estimation. Interesting examples include new asymmetric normal, logistic and Student t distributions.
Illustrations with via real data examples are provided.
In a regression setting the interest is in estimating conditional quantiles. Starting from the above family of asymmetric densities, we consider a class of conditional density functions, in which the conditional quantile takes the form of a simple location-scale expression. Local likelihood techniques are then used to provide semiparametric estimates of the regression quantile curves.
This talk is based on joint work with Rezaul Karim and Anneleen Verhasselt.
Lieu : Salle de conférences du 1er étage (1R3)
Résumé : In this presentation, we are interested in detecting outliers in an unsupervised way in multivariate numerical data sets. We focus specifically on the case of a small proportion of outlying observations, like for example fraud or manufacturing faults. Indeed, in the industrial context of fault detection, this task is of great importance for ensuring a high quality production. In addition, with the exponential increase in the number of measurements on electronic components, the concern of high dimensional data arises in the identification of outlying observations. The ippon innovation company, an expert in industrial statistics and anomaly detection, wanted to deal with this new situation. So, it collaborated with the TSE-R research laboratory by financing a thesis work. It led to several publications, some R packages and a proprietary algorithm already used by some customers. The main ideas, propositions and results will be presented.
The well-known Mahalalanobis distance computes a score for each observation taking into account the covariance structure of the data set. High scores indicate possible outliers. However, the limitation of this method appears if the dimension of the data increases while the structure of interest remains in a fixed dimension subspace. The ICS method (Invariant Coordinate Selection) overcomes this drawback by selecting relevant components for outlier detection. The results will be illustrated on simulated and real data sets through the R package ICSOutlier and the shiny app ICSShiny we implemented.
To go further, because of some multicollinearity problems in high dimension, the scatter matrices may be singular. In such a context, it is possible to generalize ICS by using some Generalized Singular Value Decomposition. This approach has some advantages compared to another approach based on generalized inverse of scatter matrices. In some examples where the structure of interest is contained in some subspace, the proposed method is able to recover the subspace of interest while other approaches may fail in identifying such a subspace. These advantages are discussed in detail from a theoretical point of view and using some simulated examples.
Keywords: Mahalanobis distance, Invariant Coordinate Selection, Affine Invariance, Components selection, High-dimensional data.
Lieu : Salle de conférence du 1er étage (1R3)
Résumé : Influenza epidemics each year cause hundreds of thousands of deaths worldwide and put high loads on health care systems, in France and elsewhere. There is always a risk that an epidemic develops into an extreme and very dangerous pandemic. Sizes of epidemics are measured by the number of visits to doctors caused by ILI, Influenza Like Illness, and health care planning relies on prediction of ILI rates. We use recent results on the multivariate GP distributions in Extreme Value Statistics to develop methods for real-time prediction of risks of exceeding very high levels, and for detection of anomalies. The GP method for real-time prediction is employed to predict ILI rates of the third week and the size of the epidemic for extreme influenza epidemics in France from observed rates of the two first weeks. The GP anomaly detection technique is applied to ILI rates of the first three weeks to aid evaluating concerns that a new epidemic could escalate into a worldwide crisis. As an additional input to resource planning we use standard methods from extreme value statistics to estimate risk of exceedance of high ILI levels in future years. The new methods are expected to be broadly applicable in health care planning and in many other areas of science and technology.
Joint work with Holger Rootzén, Chalmers university of technology, Sweden.
Lieu : Salle 106 - Bat 1R1
Lieu : Salle 106 - Bat 1R1
Lieu : Salle de conférence du 1er étage (1R3)
1 | 2