Data science, Signal analysis

A cross-cutting theme in the activities of the Institute's members is mathematics for data science and signal processing.

In this field, the methods studied and the choice of mathematical tools are motivated by the study of statistical and deterministic signal and image processing problems (2D and 3D); processing of structured data in the form of vectors, matrices; and temporal, heterogeneous, unstructured data. The data is often large, massive, imperfectly collected.

The plurality of viewpoints present at the Institute allows us to address all the mathematical aspects of the problems studied:

- their modeling ;

- obtaining theoretical guarantees on the performance of a method ;

- the numerical resolution of the mathematical problem and the analysis of the performance of the numerical methods;

- the application of methods to concrete problems, in interaction with collaborators from other scientific disciplines (medicine, biology, physics of matter, earth observation, epidemiology, criminology) and from industry.

As an example, we can cite various problems stemming from statistical/automatic learning, Bayesian learning, mathematical statistics, but also from deterministic modeling: modeling, inference, classification, regression, various inverse problems (including blind problems), data assimilation, sensitivity analysis, uncertainty quantification, anomaly or atypical region detection, robustness, fairness, explainability and interpretability, privacy constrained inference, sequential learning, usual and compressed sampling, survival analysis.

These issues are addressed and the approaches are validated using a wide range of mathematical objects and tools. As an example, we can mention : neural networks, parsimonious models, optimal transport, geometric approaches to statistics, various matrix factorization models, use of finite and infinite dimensional theorems of representatives, integral operators, partial differential equations, stochastic processes, Markov chains, extreme values, simulation of rare events and random exploration of complex spaces, Gaussian processes, empirical processes, multiple tests and tests, and data integration.

Solutions are computed using the methods best suited to the problem. Here again, the panel is very large and the members of the institute work on fast numerical methods, on convex, non-convex, non-smooth optimization problems, on stochastic optimization, on on-line, distributed or incremental processing, on Monte Carlo simulation methods.

Finally, for different applications, we process satellite and spatial data, data measuring geophysical fluid flows, data from the analysis of calculation codes, robotics, predictive maintenance, different data from industry, functional data, high-throughput biological data, 'omics data (genomics, transcriptomics, proteomics, metabolomics...), observational data from monitoring and analysis of the environment. ), observational data from patient follow-ups, as well as medical images, images from microscopes or hyperspectral images.