Institut de Mathématiques de Toulouse

Les événements de la journée


3 événements


  • Séminaire doctorants Picard

    Jeudi 21 juin 10:30-11:30 - Dominique Mattei

    C’est quoi… le théorème de Reconstruction de Bondal et Orlov ?

    Résumé : Étant donnée une variété algébrique projective lisse X, nous définirons sa catégorie dérivée des faisceaux cohérents D^b(X) et montrerons que dans les cas où le faisceau (anti)canonique est ample, cette catégorie caractérise complètement X.

    [En savoir plus]


  • Mathématiques de l’apprentissage

    Jeudi 21 juin 12:30-13:30 - Thi Thu Hang Nguyen - LAAS

    Reinforcement Learning : An overview

    Résumé : In this talk, we will take an overview of reinforcement learning. In the first part, an introduction of Markov Decision Processes is presented. That gives us motivation for learning techniques in the sequel. We then discuss about temporal difference (TD) learning. This type of learning is to learn value function, denoted by V and Q with update rule : V(s) ←− V(s) + α(r + V(s0) − V(s)) where α is learning rate and r + V(s0) − V(s) is called TD error. We use two notations for value function because there are two types of value function. The first type is V which is for the one depends only on state. The other one is Q which depends on both state and action. When the action space is too large or continuous, we can not apply TD learning since its basic idea is storing value in a table of state/action - value. To solve that problem, we use function approximation technique. With this one, we approximate value function with a function and during the learning process, we will update its parameters. In the next part, we introduce Deep Q-Network (DQN) where we use neural network to approximate the value function for both discrete and continuous action space.

    Lieu : Room 207, building 1R2, UPS.

    [En savoir plus]