Series |
ISTE. |
|
ISTE.
|
Subject |
Artificial intelligence -- Mathematics.
|
|
Markov processes.
|
Alt Name |
Sigaud, Olivier.
|
|
Buffet, Olivier.
|
|
Ohio Library and Information Network.
|
Uniform Title |
Processus décisionnels de Markov en intelligence artificielle. English
|
Description |
1 online resource (481 pages). |
|
polychrome rdacc |
Contents |
Cover; Title Page; Copyright Page; Table of Contents; Preface; List of Authors; PART 1. MDPS: MODELS AND METHODS; Chapter 1. Markov Decision Processes; 1.1. Introduction; 1.2. Markov decision problems; 1.2.1. Markov decision processes; 1.2.2. Action policies; 1.2.3. Performance criterion; 1.3. Value functions; 1.3.1. The finite criterion; 1.3.2. The [beta]-discounted criterion; 1.3.3. The total reward criterion; 1.3.4. The average reward criterion; 1.4. Markov policies; 1.4.1. Equivalence of history-dependent and Markov policies; 1.4.2. Markov policies and valued Markov chains |
|
1.5. Characterization of optimal policies1.5.1. The finite criterion; 1.5.1.1. Optimality equations; 1.5.1.2. Evaluation of a deterministic Markov policy; 1.5.2. The discounted criterion; 1.5.2.1. Evaluation of a stationary Markov policy; 1.5.2.2. Optimality equations; 1.5.3. The total reward criterion; 1.5.4. The average reward criterion; 1.5.4.1. Evaluation of a stationary Markov policy; 1.5.4.2. Optimality equations; 1.6. Optimization algorithms for MDPs; 1.6.1. The finite criterion; 1.6.2. The discounted criterion; 1.6.2.1. Linear programming; 1.6.2.2. The value iteration algorithm |
|
1.6.2.3. The policy iteration algorithm1.6.3. The total reward criterion; 1.6.3.1. Positive MDPs; 1.6.3.2. Negative MDPs; 1.6.4. The average criterion; 1.6.4.1. Relative value iteration algorithm; 1.6.4.2. Modified policy iteration algorithm; 1.7. Conclusion and outlook; 1.8. Bibliography; Chapter 2. Reinforcement Learning; 2.1. Introduction; 2.1.1. Historical overview; 2.2. Reinforcement learning: a global view; 2.2.1. Reinforcement learning as approximate dynamic programming; 2.2.2. Temporal, non-supervised and trial-and-error based learning; 2.2.3. Exploration versus exploitation |
|
2.2.4. General preliminaries on estimation methods2.3. Monte Carlo methods; 2.4. From Monte Carlo to temporal difference methods; 2.5. Temporal difference methods; 2.5.1. The TD(0) algorithm; 2.5.2. The SARSA algorithm; 2.5.3. The Q-learning algorithm; 2.5.4. The TD, SARSA and Q algorithms; 2.5.5. Eligibility traces and TD; 2.5.6. From TD to SARSA; 2.5.7. Q; 2.5.8. The R-learning algorithm; 2.6. Model-based methods: learning a model; 2.6.1. Dyna architectures; 2.6.2. The E3 algorithm; 2.6.3. The Rmax algorithm; 2.7. Conclusion; 2.8. Bibliography |
|
Chapter 3. Approximate Dynamic Programming3.1. Introduction; 3.2. Approximate value iteration (AVI); 3.2.1. Sample-based implementation and supervised learning; 3.2.2. Analysis of the AVI algorithm; 3.2.3. Numerical illustration; 3.3. Approximate policy iteration (API); 3.3.1. Analysis in L [infinity symbol]-norm of the API algorithm; 3.3.2. Approximate policy evaluation; 3.3.3. Linear approximation and least-squares methods; 3.3.3.1. TD; 3.3.3.2. Least-squares methods; 3.3.3.3. Linear approximation of the state-action value function; 3.4. Direct minimization of the Bellman residual |
|
3.5. Towards an analysis of dynamic programming in Lp-norm |
Summary |
Markov Decision Processes (MDPs) are a mathematical framework for modeling sequential decision problems under uncertainty as well as Reinforcement Learning problems. Written by experts in the field, this book provides a global view of current research using MDPs in Artificial Intelligence. It starts with an introductory presentation of the fundamental aspects of MDPs (planning in MDPs, Reinforcement Learning, Partially Observable MDPs, Markov games and the use of non-classical criteria). Then it presents more advanced research trends in the domain and gives some concrete examples using illustrations. |
Bibliography Note |
Includes bibliographical references and index. |
Access |
Available to OhioLINK libraries. |
Note |
Description based upon print version of record. |
ISBN |
9781118557426 (electronic bk.) |
|
1118557425 (electronic bk.) |
|
9781118619872 (electronic bk.) |
|
1118619870 (electronic bk.) |
OCLC # |
830161640 |
Additional Format |
Print version: Sigaud, Olivier Markov Decision Processes in Artificial Intelligence London : Wiley,c2013 9781848211674. |
|