Dynamic Programming and Stochastic Control

Fall 2009


Course Info

Schedule: Monday, Wednesday 9.00am - 10.30am, room: Towne 305

Instructor: Jerome Le Ny
Postdoctoral Researcher
Levine 465, email: j e r o m e l [a t] s e a s . u p e n n . e d u
Office Hours: Mon. and Wed. 1-2pm, Levine 465

Annoucements

  • Midterm: take-home, out Wed. 11/04 at noon, due Th. 11/05 at noon (NO EXTENSION, NO COLLABORATION)
  • Pset 4: due Wed. 11/04.
  • Pset 3: due Wed. 10/21 and Mon. 11/02.
  • Pset 2: due Wed. 07/07.
  • Pset 1: due Wed. 09/23. Check the corrections in the assignments section.


Course Description

The course focuses on discrete-time optimal control for stochastic systems, with a strong emphasis toward computational techniques for large-scale problems. In addition, it exposes the students to various tools used for modeling and studying stochastic systems.
1) We quickly introduce the dynamic programming approach to deterministic and stochastic optimal control problems with a finite horizon. This tool allows us to solve certain problems by proving crucial properties of the optimal cost function and policy. As first applications, we develop the classical (s,S)-policies for an inventory control problem, and treat linear quadratic control problems. The case of imperfect information (output feedback) is considered as well in the form of the Linear Quadratic Gaussian problem, and Partially Observable Markov Decision Processes for discrete state-spaces. We then cover the two principal methods for solving infinite-horizon problems: value and policy iteration.
2) For most problems of interest outside specific (and important) cases, a direct application of the dynamic programming algorithm leads to intractable computations (the "curse of dimensionality"). However, the basic principles developed in the first part of the course guide us in the design of good heuristics, sometimes together with performance bounds. In addition, a major topic of current research interest is the development of control policies when a mathematical model of the system is not available but is replaced by a simulator (reinforcement learning). This topic is very close to the subject of stochastic approximation algorithms. Finally, in the last part of the course we cover some additional techniques developed specifically for the suboptimal control of complex dynamical networks.

Grading and Prerequisites
~5 Problem Sets (20%), Midterm (30%), Project (%50).
Students should have some knowledge of probability theory and Markov chains, roughly at the level of [BT08]. Prior exposure to control theory and optimization is recommended. They should be familiar with a high-level programming language (such as MATLAB, Python, Java, C...). They can expect a number of assignments to require implementing algorithms studied in class. The project will also require a computational component.



Date Topic Reading Assignments
W. Sept. 9 Introduction. Examples. The Dynamic Programming Algorithm. notes, [Ber07, I-1], [BL84] Pset 1 out
M. Sept. 14 Deterministic Optimal Control and DP for deterministic problems notes, [Ber07, I-2], DP for algorithm design
W. Sept. 16 Inventory Control: (s,S) policies. Properties of the Value Function notes, [Por02], [Ber07, I-4]
M. Sept. 21 The Linear Quadratic Regulator (LQR, perfect information) notes, [Ber07, I-4]
W. Sept. 23 Partial Information Problems - Sufficient Statistics notes, [Ber07, I-5] Pset 1 due. Pset 2 out
M. Sept. 28 The Linear Quadratic Gaussian problem - Kalman filtering notes, [Ber07, I-5]
W. Sept. 30 Finite State Space Problems - POMDPs notes, [Ber07, I-5]
M. Oct. 5 Finish POMDPs. Intro to "Suboptimal Control". notes, [Ber05], [Boyd]
W. Oct. 7 Suboptimal Control. Rollout. notes, [Ber07, I-7, II-1] Pset 2 due. Pset 3 out
M. Oct. 12 MPC. Intro to Discounted infinite horizon DP. notes, [Ber07, II-1]
W. Oct. 14 Discounted DP. Value Iteration. notes, [Ber07, II-1]
M. Oct. 19 Fall Break
W. Oct. 21 Value and Policy Iteration. notes, [Ber07, II-1] Proposal and PS3 due. PS4 out
M. Oct. 26 Class canceled
W. Oct. 28 Class canceled
M. Nov. 2 Intro to Stochastic Approximation Algorithms, ODE method. notes, [Bor08], [KY03].
W. Nov. 4 Q-Learning. notes, [BM00], [Ber07, II-6.4] Pset 4 due. Midterm out.
M. Nov. 9 Direct Policy Evaluation, Temporal Differences. notes, [Ber07, II-6.1,2]
W. Nov. 11 Projected Equation Methods. LSTD and LSPE. notes, [Ber07, II-6.3]
M. Nov. 16 Finish Simulation-Based Policy Evaluation. notes, [Ber07, II-6.3]
W. Nov. 18 Linear Programming Methods. notes, [Den82], [Put94], [dFVR03].
M. Nov. 23 Stochastic Shortest Path Problems. notes, [Ber07, II-2,6.5] Project Progress Report due
W. Nov. 25 Average-Cost Problems. notes, [Ber07, II-4].
M. Nov. 30 Average-Cost Problems. notes, [Ber07, II-4].
W. Dec. 1 Search in Policy Space. Wrap-up. notes, [Ber07]
M. Dec. 7 Project Presentations
W. Dec. 9 Project Presentations Final Project Report due

References

Textbooks, Course Material, Tutorials
[Ath71] M. Athans, The role and use of the stochastic linear-quadratic-Gaussian problem in control system design, IEEE Transactions on Automatic Control, 16-6, pp. 529-552, Dec. 1971.
[Bel57] R.E. Bellman, "Dynamic Programming", Dover, 2003
[Ber07] D.P. Bertsekas, "Dynamic Programming and Optimal Control", Vol I and II, 3rd edition, Athena Scientific, 2007. The main reference for the course.
[Bor08] V.S. Borkar, "Stochastic Approximation: A Dynamical Systems Viewpoint", Cambridge University Press, 2008.
[Boyd] S. Boyd: Stanford's EE364II lectures 15 and 16 on Model Predictive Control.
[BT96] D.P. Bertsekas and J.N. Tsitsiklis, "Neuro-Dynamic Programming", Athena Scientific, 1996.
[BT08] D.P. Bertsekas and J.N. Tsitsiklis, "Introduction to Probability", 2nd edition, Athena Scientific, 2008.
[Den82] E.V. Denardo, "Dynamic Programming: Models and Applications", Dover, 2003
[Mey07] S.P. Meyn, "Control Techniques for Complex Networks", Cambridge University Press, 2007.
[Por02] E.L. Porteus, "Foundations of Stochastic Inventory Theory", Stanford Business Books, 2002.
[Put94] M.L. Puterman, "Markov Decision Processes", Wiley, 1994. A classic reference.
[Rab89] L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition", Proceedings of the IEEE, Vol. 77, No. 2, pp.257-286, Feb. 1989.
[Ros83] S. Ross, "Introduction to Stochastic Dynamic Programming", Academic Press, New York, 1983.
[SB98] R.S. Sutton and A.G. Barto, "Reinforcement Learning: An Introduction", MIT Press, 1998.
[Whi82] P. Whittle, "Optimization Over Time: Dynamic Programming and Stochastic Control", Wiley, 1982.

Research Monographs and Papers.
[Alt99] E. Altman, "Constrained Markov Decision Processes", Chapman & Hall/CRC, 1999.
[Ber05] D.P. Bertsekas, "Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC", in Fundamental Issues in Control, European J. of Control, Vol. 11, Nos. 4-5, 2005.
[BL84] R. Bellman and E. Lee, "History and development of dynamic programming", IEEE Control Systems Magazine, Vol. 4, Issue 4, pp.24-28, Nov 1984.
[BS76] D.P. Bertsekas and S.E. Shreve, "Stochastic Optimal Control: The Discrete-Time Case", Academic Press, 1978.
[BM00] V.S. Borkar and S.P. Meyn, "The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning", SIAM Journal on Control and Optimization, Vol. 38(2), pp. 447-469.
[dFVR03] D. P. de Farias and B. Van Roy, "The Linear Programming Approach to Approximate Dynamic Programming", Operations Research, Vol. 51, No. 6, November-December 2003, pp. 850-865.
[FS02] E.A. Feinberg and A. Schwartz, "Handbook of Markov Decision Processes: Methods and Applications", Kluwer's International Series, 2002.
[HL96] O. Hernandez-Lerma and J.B. Lasserre, "Discrete-Time Markov Control Processes: Basic Optimality Criteria", Springer, 1996.
[KY03] H.J. Kushner, G.G. Yin, "Stochastic Approximation and Recursive Algorithms and Applications", Springer, 2003.
[CD+09] W. Chen, D. Huang, A. Kulkarni, J. Unnikrishnan, Q. Zhu, P. Mehta, S. Meyn A. Wierman, "Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management", 48th IEEE Conference on Decision and Control, 2009.

Other Similar Courses
6.231 Dynamic Programming and Stochastic Control @ MIT
Decision Making in Large-Scale Systems @ MIT
MS&E339/EE377b Approximate Dynamic Programming @ Stanford
ECE 555 Control of Stochastic Systems @ UIUC
Learning for robotics and control @ Berkeley
Topics in AI: Dynamic Programming @ UBC
Optimization and Control @ University of Cambridge