Lecture 04: Reinforcement Learning

Applications

Consumption Problem: Suppose there is an investor with given initial capital. At the beginning of each of periods she can decide how much of the capital she consumes and how much she invests into a risky asset. The amount she consumes is evaluated by a utility function as well as the terminal wealth. The remaining capital is invested into a risky asset where we assume that the investor is small and thus not able to influence the asset price and the asset is liquid. How should she consume/invest in order to maximize the sum of her expected discounted utility?
Cash Balance or Inventory Problem: Imagine a company which tries to find the optimal level of cash over a finite number of periods. We assume that there is a random stochastic change in the cash reserve each period (due to withdrawal or earnings). Since the firm does not earn interest from the cash position, there are holding cost for the cash reserve if it is positive, but also interest (cost) in case it is negative. The cash reserve can be increased or decreased by the management at each decision epoch which implies transfer costs. What is the optimal cash balance policy?
Mean-Variance Problem: Consider a small investor who acts on a given financial market. Her aim is to choose among all portfolios which yield at least a certain expected return (benchmark) after periods, the one with smallest portfolio variance. What is the optimal investment strategy?

Lecture 04

Reinforcement Learning

Outlines

Part 1 · Introduction to Reinforcement Learning

Motivation

What Is Reinforcement Learning?

Applications of Reinforcement Learning in Finance

RL Spectrum in Finance: From Dynamic Programming to Deep RL

Why This Evolution Matters

Overview of Key Concepts in RL

Part 2 · Formulating Finance Problems as RL

Motivation for MDP in Finance

Markov Decision Process (MDP)

Designing State and Action Spaces

Reward Function in Financial Contexts

Mathematical Formulation

Applications

Defining A Markov Decision Models

An equivalent definition of MDP

Example: Consumption Problem

Finite Horizon Markov Decision Models

The Bellman Equation

Maximizer, the Bellman Equation & Verification Theorem

The Structure Assumption & Structure Theorem

Value iteration vs policy iteration

Modeling The Financial Markets with MDP

Summary: Modeling and Solution Approaches with MDP

MDP Applications in Finance: A Cash Balance Problem

Problem Formulation

Solution

MDP Applications in Finance: Consumption and Investment Problems

Formulation

Solution

MDP Applications in Finance: Terminal Wealth Problems

Formulation

Solution

MDP Applications in Finance: Portfolio Selection with Transaction Costs

Formulation

MDP Applications in Finance: Dynamic Mean-Variance Problems

Formulation

MDP Applications in Finance: Index Tracking

Formulation

Part 3 · Reinforcement Learning Algorithms

Value-based vs. Policy-based Methods

Q-Learning Overview

Deep Q-Networks (DQN)

Policy Gradient Methods

Part 4 · Deep Reinforcement Learning and Applications

Combining RL with Deep Learning

Application: Optimal Execution

Application: (multi-period mean-variance) Portfolio Optimization

Application: Option Pricing and Hedging

Market Making

Application: Robo-advising

Application: Smart Order Routing

Emerging Trends in RL Applications

Part 5 · Summary and Discussion

Summary of Key Takeaways

Final Takeaways