Instructor: Jerome Le Ny

Postdoctoral Researcher

Levine 465, email: j e r o m e l [a t] s e a s . u p e n n . e d u

Office Hours: Mon. and Wed. 1-2pm, Levine 465

**Midterm: take-home, out Wed. 11/04 at noon, due Th. 11/05 at noon (NO EXTENSION, NO COLLABORATION)**- Pset 4: due Wed. 11/04.
- Pset 3: due Wed. 10/21 and Mon. 11/02.
- Pset 2: due Wed. 07/07.
- Pset 1: due Wed. 09/23. Check the corrections in the assignments section.

1) We quickly introduce the dynamic programming approach to deterministic and stochastic optimal control problems with a finite horizon. This tool allows us to solve certain problems by proving crucial properties of the optimal cost function and policy. As first applications, we develop the classical (s,S)-policies for an inventory control problem, and treat linear quadratic control problems. The case of imperfect information (output feedback) is considered as well in the form of the Linear Quadratic Gaussian problem, and Partially Observable Markov Decision Processes for discrete state-spaces. We then cover the two principal methods for solving infinite-horizon problems: value and policy iteration.

2) For most problems of interest outside specific (and important) cases, a direct application of the dynamic programming algorithm leads to intractable computations (the "curse of dimensionality"). However, the basic principles developed in the first part of the course guide us in the design of good heuristics, sometimes together with performance bounds. In addition, a major topic of current research interest is the development of control policies when a mathematical model of the system is not available but is replaced by a simulator (reinforcement learning). This topic is very close to the subject of stochastic approximation algorithms. Finally, in the last part of the course we cover some additional techniques developed specifically for the suboptimal control of complex dynamical networks.

Students should have some knowledge of probability theory and Markov chains, roughly at the level of [BT08]. Prior exposure to control theory and optimization is recommended. They should be familiar with a high-level programming language (such as MATLAB, Python, Java, C...). They can expect a number of assignments to require implementing algorithms studied in class. The project will also require a computational component.

Date | Topic | Reading | Assignments |
---|---|---|---|

W. Sept. 9 | Introduction. Examples. The Dynamic Programming Algorithm. | notes, [Ber07, I-1], [BL84] | Pset 1 out |

M. Sept. 14 | Deterministic Optimal Control and DP for deterministic problems | notes, [Ber07, I-2], DP for algorithm design | |

W. Sept. 16 | Inventory Control: (s,S) policies. Properties of the Value Function | notes, [Por02], [Ber07, I-4] | |

M. Sept. 21 | The Linear Quadratic Regulator (LQR, perfect information) | notes, [Ber07, I-4] | |

W. Sept. 23 | Partial Information Problems - Sufficient Statistics | notes, [Ber07, I-5] | Pset 1 due. Pset 2 out |

M. Sept. 28 | The Linear Quadratic Gaussian problem - Kalman filtering | notes, [Ber07, I-5] | |

W. Sept. 30 | Finite State Space Problems - POMDPs | notes, [Ber07, I-5] | |

M. Oct. 5 | Finish POMDPs. Intro to "Suboptimal Control". | notes, [Ber05], [Boyd] | |

W. Oct. 7 | Suboptimal Control. Rollout. | notes, [Ber07, I-7, II-1] | Pset 2 due. Pset 3 out |

M. Oct. 12 | MPC. Intro to Discounted infinite horizon DP. | notes, [Ber07, II-1] | |

W. Oct. 14 | Discounted DP. Value Iteration. | notes, [Ber07, II-1] | |

M. Oct. 19 | Fall Break | ||

W. Oct. 21 | Value and Policy Iteration. | notes, [Ber07, II-1] | Proposal and PS3 due. PS4 out |

M. Oct. 26 | Class canceled | ||

W. Oct. 28 | Class canceled | ||

M. Nov. 2 | Intro to Stochastic Approximation Algorithms, ODE method. | notes, [Bor08], [KY03]. | |

W. Nov. 4 | Q-Learning. | notes, [BM00], [Ber07, II-6.4] | Pset 4 due. Midterm out. |

M. Nov. 9 | Direct Policy Evaluation, Temporal Differences. | notes, [Ber07, II-6.1,2] | |

W. Nov. 11 | Projected Equation Methods. LSTD and LSPE. | notes, [Ber07, II-6.3] | |

M. Nov. 16 | Finish Simulation-Based Policy Evaluation. | notes, [Ber07, II-6.3] | |

W. Nov. 18 | Linear Programming Methods. | notes, [Den82], [Put94], [dFVR03]. | |

M. Nov. 23 | Stochastic Shortest Path Problems. | notes, [Ber07, II-2,6.5] | Project Progress Report due |

W. Nov. 25 | Average-Cost Problems. | notes, [Ber07, II-4]. | |

M. Nov. 30 | Average-Cost Problems. | notes, [Ber07, II-4]. | |

W. Dec. 1 | Search in Policy Space. Wrap-up. | notes, [Ber07] | |

M. Dec. 7 | Project Presentations | ||

W. Dec. 9 | Project Presentations | Final Project Report due |

[Bel57] R.E. Bellman, "Dynamic Programming", Dover, 2003

[Ber07] D.P. Bertsekas, "Dynamic Programming and Optimal Control", Vol I and II, 3rd edition, Athena Scientific, 2007. The main reference for the course.

[Bor08] V.S. Borkar, "Stochastic Approximation: A Dynamical Systems Viewpoint", Cambridge University Press, 2008.

[Boyd] S. Boyd: Stanford's EE364II lectures 15 and 16 on Model Predictive Control.

[BT96] D.P. Bertsekas and J.N. Tsitsiklis, "Neuro-Dynamic Programming", Athena Scientific, 1996.

[BT08] D.P. Bertsekas and J.N. Tsitsiklis, "Introduction to Probability", 2nd edition, Athena Scientific, 2008.

[Den82] E.V. Denardo, "Dynamic Programming: Models and Applications", Dover, 2003

[Mey07] S.P. Meyn, "Control Techniques for Complex Networks", Cambridge University Press, 2007.

[Por02] E.L. Porteus, "Foundations of Stochastic Inventory Theory", Stanford Business Books, 2002.

[Put94] M.L. Puterman, "Markov Decision Processes", Wiley, 1994. A classic reference.

[Rab89] L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", Proceedings of the IEEE, Vol. 77, No. 2, pp.257-286, Feb. 1989.

[Ros83] S. Ross, "Introduction to Stochastic Dynamic Programming", Academic Press, New York, 1983.

[SB98] R.S. Sutton and A.G. Barto, "Reinforcement Learning: An Introduction", MIT Press, 1998.

[Whi82] P. Whittle, "Optimization Over Time: Dynamic Programming and Stochastic Control", Wiley, 1982.

[Ber05] D.P. Bertsekas, "Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC", in Fundamental Issues in Control, European J. of Control, Vol. 11, Nos. 4-5, 2005.

[BL84] R. Bellman and E. Lee, "History and development of dynamic programming", IEEE Control Systems Magazine, Vol. 4, Issue 4, pp.24-28, Nov 1984.

[BS76] D.P. Bertsekas and S.E. Shreve, "Stochastic Optimal Control: The Discrete-Time Case", Academic Press, 1978.

[BM00] V.S. Borkar and S.P. Meyn, "The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning", SIAM Journal on Control and Optimization, Vol. 38(2), pp. 447-469.

[dFVR03] D. P. de Farias and B. Van Roy, "The Linear Programming Approach to Approximate Dynamic Programming", Operations Research, Vol. 51, No. 6, November-December 2003, pp. 850-865.

[FS02] E.A. Feinberg and A. Schwartz, "Handbook of Markov Decision Processes: Methods and Applications", Kluwer's International Series, 2002.

[HL96] O. Hernandez-Lerma and J.B. Lasserre, "Discrete-Time Markov Control Processes: Basic Optimality Criteria", Springer, 1996.

[KY03] H.J. Kushner, G.G. Yin, "Stochastic Approximation and Recursive Algorithms and Applications", Springer, 2003.

[CD+09] W. Chen, D. Huang, A. Kulkarni, J. Unnikrishnan, Q. Zhu, P. Mehta, S. Meyn A. Wierman, "Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management", 48th IEEE Conference on Decision and Control, 2009.

Decision Making in Large-Scale Systems @ MIT

MS&E339/EE377b Approximate Dynamic Programming @ Stanford

ECE 555 Control of Stochastic Systems @ UIUC

Learning for robotics and control @ Berkeley

Topics in AI: Dynamic Programming @ UBC

Optimization and Control @ University of Cambridge