View on GitHub

rl

Introduction to Reinforcement Learning: A Short Course

Introduction to Reinforcement Learning

Welcome! This course is jointly taught by UC Berkeley and the Tsinghua-Berkeley Shenzhen Institute (TBSI).

Instructors

Course Schedule

China Time California Time
July 7, 8, 9, 10 (Tu-F) July 6, 7, 8, 9 (M-Th)
July 14, 15, 16, 17 (Tu-F); July 13, 14, 15, 16, 17 (M-Th)
all at 08:30-10:05 China Time all at 5:30pm PT - 7:05pm PT

Add to Google Calendar: ()

Day-by-Day Schedule

Day Topic Speaker Pre-recorded Lecture Slides / Notes Real-time Lecture Recordings
1 1a. Introduction - Course Org Scott Moura Zoom Recording PW: 1e*OV@Re LEC1a Slides Recording Link PW: 9L%JePa=
  1b. Introduction – History of RL Scott Moura Zoom Recording PW: 1k.E69^o LEC1a Slides  
  1c. Optimal Control Intro Scott Moura Zoom Recording PW: 2B&=2@*@    
2 2a. Dynamic Programming Scott Moura Zoom Recording PW: 3F*1rg%? LEC2a Notes Recording Link PW: 8Q?#51=J
  2b. Case Study: Linear Quadratic Regulator (LQR) Scott Moura Zoom Recording PW: 5Y#4=58& LEC2b Notes  
3 3a. Policy Evaluation & Policy Improvement Scott Moura Zoom Recording PW: 9N@%H4&@ LEC3a Notes Recording Link PW: 1A@@0G63
  3b. Policy Iteration Algo Scott Moura Zoom Recording PW: 6y+!+6#9 LEC3b Notes  
  3c. Case Study: LQR Scott Moura Zoom Recording PW: 6D@YkC&= LEC3c Notes  
4 4a. Approximate DP: TD Error & Value Function Approx. Scott Moura Zoom Recording PW: 6v&78$We LEC4a Notes Recording Link PW: 4t=#ye7T
  4b. Case Study: LQR Scott Moura Zoom Recording PW: 1O^fh.8+ LEC4b Notes Installation Recording PW: 2s+83!eQ
  4c. Online RL with ADP Scott Moura Zoom Recording PW: 0q=.4378 LEC4c Notes  
5 5a. Actor-Critic Method Scott Moura Zoom Recording PW: 2y!@@#$7 LEC5a Notes Recording Link PW: 1Z^6B28+
  5b. Case Study: Offshore Wind Scott Moura   LEC5b Notes  
6 6a. Markov Decision Process Saehong Park Zoom Recording PW:5L=*%&2i LEC6 Notes Recording Link PW: 4L*=91?@
  6b. Q-Learning Saehong Park Zoom Recording PW: 3K!+fj^V    
7 7a. Policy Optimization Saehong Park Zoom Recording PW: 0W$fa0$M LEC7a Notes Recording Link PW: 9j++=3$5
  7b. Policy Gradient Saehong Park Zoom Recording PW: 2N++5&I3 LEC7b Notes  
  7c. Policy Gradient Saehong Park Zoom Recording PW: 3j%n80** LEC7c Notes  
8 8a. Actor Critic Saehong Park Zoom Recording PW: 2F!WI9$8 LEC8a Notes Recording Link PW: 0W$+=9P*
  8b. Actor Critic Saehong Park Zoom Recording PW: 9r$HH%59 LEC8b Notes  
  8c. RL for Energy Systems: Battery Fast-charging Saehong Park Zoom Recording PW: 9r$HH%59 Slides  

Topic Outline

  1. Optimal Control
  2. Dynamic Programming
    1. Principal of Optimality & Value Functions
      • Case Study: Linear Quadratic Regulator (LQR)
  3. Policy Evaluation & Policy Improvement
    1. Policy Iteration Algo & Variants
      • Case Study: LQR
  4. Approximate Dynamic Programming (ADP)
    1. Temporal Difference (TD) Error
    2. Value Function Approximation
      • Case Study: LQR
    3. Online RL with ADP
    4. Actor-Critic Method
      • Case Study: Offshore Wind
  5. Q-Learning
    1. Q-learning algorithm
    2. Advanced Q-learning algorithm, i.e., DQN
  6. Policy Gradient
    1. Policy Optimization
    2. Vanilla policy gradient (REINFORCE)
  7. Actor-Critic using Policy Gradient
    1. Actor-Critic using Policy Gradient
    2. Advanced Actor-Critic algorithm, i.e., DDPG
  8. RL for energy systems
    1. Case Study: Battery Fast-charging

Lectures Notes

Jupyter Notebook