L06 MDP in Finance

DO NOT DISTRIBUTE.

MDP Theory

Introduction

Introduction

The Markov Decision Process is the sequence of random variables (XnX_n) which describes the stochastic evolution
of the system states. Of course the distribution of (XnX_n) depends on the chosen actions.





A control π\pi is a sequence of decision rules (fnf_n) with fn:EAf_n:E\rightarrow A where fn(x)Dn(x)f_n(x)\in D_n(x) determines for each possible state xEx\in E the next action fn(x)f_n(x) at time nn. Such a sequence π=(fn)\pi=(f_n) is called policy or strategy. Formally the Markov Decision Problem is given by

V0(x)=supπExπ[k=0N1rk(Xk,fk(Xk))+gN(XN)], xE.V_0(x)=\sup_{\pi}\mathbb{E}_x^{\pi}\left[\sum_{k=0}^{N-1}r_k\left(X_k,f_k(X_k)\right)+g_N(X_N)\right],\text{ }x\in E.


Applications: Consumption Problem

A Reference Book




You may buy it via Amazon, or find it via Springer Link


Markov Decision Models

Defining A Markov Decision Models

Markov Decision Model with planning horizon NNN\in\mathbb{N} consists of a set of data (E,A,Dn,Qn,rn,gN)(E,A,D_n,Q_n,r_n,g_N) with the following meaning for n=0,1,2,,N1n =0,1,2,\dots,N-1:


An equivalent definition of MDP

A Markov Decision Model is equivalently described by the set of data (E,A,Dn,Z,Tn,QnZ,rn,gN)(E,A,D_n,Z,T_n,Q^Z_n ,r_n,g_N) with the following meaning:


Example: Consumption Problem

We denote by Zn+1Z_{n+1} the random return of our risky asset over period [n,n+1)[n, n+1). Further we suppose that Z1,,ZNZ_1,\dots,Z_N are non-negative, independent random variables and we assume that the consumption is evaluated by utility functions Un:R+RU_n:\mathbb{R}_+\rightarrow \mathbb{R}. The final capital is also evaluated by a utility function UNU_N. Thus we choose the following data:


Finite Horizon Markov Decision Models

Finite Horizon Markov Decision Models

**Integrability Assumption (ANA_N):} For n=0,1,,Nn=0,1,\dots,N
δnN(x)=supπEnxπ[k=nN1rk+(Xk,fk(Xk))+gN+(XN)]<, xE.\delta^N_n(x)=\sup_{\pi}\mathbb{E}_{nx}^{\pi}\left[\sum_{k=n}^{N-1}r_k^+(X_k,f_k(X_k))+g_N^+(X_N)\right]<\infty,\text{ }x\in E.
Assumption (ANA_N) is assumed to hold for the NN-stage Markov Decision Problems.

Example: (Consumption Problem) In the consumption problem Assumption (ANA_N) is satisfied if we assume that the utility functions are increasing and concave and EZn<\mathbb{E} Z_n < \infty for all nn, because then rnr_n and gNg_N can be bounded by an affine-linear function c1+c2xc_1 + c_2x with c1,c20c_1, c_2 \geq 0 and since XnxZ1ZnX_n \leq xZ_1 \dots Z_n a.s. under every policy, the function δnN\delta_n^N satisfies
δnN(x)=supπEnxπ[k=nN1Uk+(fk(Xk))+UN+(XN)]Nc1+xc2k=nNEZ1EZk<, x>0\begin{array}{lll} \delta^N_n(x)&=&\sup_{\pi}\mathbb{E}_{nx}^{\pi}\left[\sum_{k=n}^{N-1}U_k^+(f_k(X_k))+U_N^+(X_N)\right]\\ &\leq&Nc_1+xc_2\sum_{k=n}^N\mathbb{E}Z_1\dots\mathbb{E}Z_k<\infty,\text{ }x>0 \end{array}

For n=0,1,,Nn = 0, 1,\dots, N and a policy π=(f0,,fN1)\pi = (f_0,\dots, f_{N-1}) let Vnπ(x)V_{n\pi}(x) be defined by
Vnπ(x)=Enxπ[k=nN1rk(Xk,fk(Xk))+gN(XN)], xE.V_{n\pi}(x)=\mathbb{E}_{nx}^{\pi}\left[\sum_{k=n}^{N-1}r_k(X_k,f_k(X_k))+g_N(X_N)\right],\text{ }x\in E.
Vnπ(x)V_{n\pi}(x) is the expected total reward at time nn over the remaining stages nn to NN if we use policy π\pi and start in state xEx\in E at time nn. The value function VnV_n is defined by
Vn(x)=supπVnπ(x), xE.V_n(x)=\sup_{\pi}V_{n\pi}(x),\text{ }x\in E.
Vn(x)V_n(x) is the maximal expected total reward at time nn over the remaining stages nn to NN if we start in state xEx\in E at time nn. The functions VnπV_{n\pi} and VnV_n are well-defined since
Vnπ(x)Vn(x)δnN(x)<, xE.V_{n\pi}(x)\leq V_{n}(x)\leq\delta_n^N(x)<\infty,\text{ }x\in E.


The Bellman Equation

The Bellman Equation

Let us denote by M(E)={vE[,)v is measurable}\mathbb{M}(E)=\{v\in E\mapsto[-\infty,\infty)| v\text{ is measurable}\}, we define the following operators:


Reward Iteration

Theorem
Let π=(f0,,fN1)\pi=(f_0,\dots,f_{N-1}) be an NN-stage policy. For n=0,1,,N1n=0,1,\dots,N-1 it holds:

Example: (Consumption Problem) Note that for fnFnf_n\in F_n the Tnfn\mathcal{T}_{nf_n} operator in this example reads
Tnfn=Un(fn(x))+Ev((xfn(x))Zn+1).\mathcal{T}_{nf_n}=U_n(f_n(x))+\mathbb{E}v\left(\left(x-f_n\left(x\right)\right)Z_{n+1}\right).
Now let us assume that Un(x)=logxU_n(x) = \log x for all nn and gN(x)=logxg_N(x) = \log x. Moreover, we assume that the return distribution is independent of nn and has finite expectation EZ\mathbb{E} Z. Then (ANA_N) is satisfied as we have shown before. If we choose the NN-stage policy π=(f0,,fN1)\pi = (f_0, \dots , f_{N-1})with fn(x)=cxf_n(x) = cx and c[0,1]c \in [0, 1], i.e. we always consume a constant fraction of the wealth, then the Reward Iteration implies by induction on NN that
V0π(x)=(N+1)logx+Nlogc+(N+1)N2(log(1c)+ElogZ).V_{0\pi}(x)=(N+1)\log x+N\log c+\frac{(N+1)N}{2}\left(\log(1-c)+\mathbb{E}\log Z\right).
Hence, π=(f0,,fN1)\pi^*=(f_0^*, \dots , f_{N-1}^*) with fn(x)=cxf_n^*(x)=c^*x and c=2N+3c^*=\frac{2}{N+3} maximizes the expected log-utility (among all linear consumption policies).


Maximizer, the Bellman Equation & Verification Theorem


The Structure Assumption & Structure Theorem





The Financial Markets

Asset Dynamics and Portfolio Strategies

Asset Dynamics and Portfolio Strategies

for all n=1,,N1n =1,\dots,N-1, i.e. the current wealth is just reassigned to the assets.


Modeling and Solution Approaches with MDP

Modeling and Solution Approaches with MDP


MDP Applications in Finance

A Cash Balance Problem

A Cash Balance Problem

The cash balance problem involves the decision about the optimal cash level of a firm over a finite number of periods. The aim is to use the firm’s liquid assets efficiently. There is a random stochastic change in the cash reserve each period (which can be both positive and negative). Since the firm does not earn interest from the cash position, there are holding cost or \textit{opportunity cost} for the cash reserve if it is positive. But also in case the cash reserve is negative the firm incurs an out-of-pocket expense and has to pay interest. The cash reserve can be increased or decreased by the management at the beginning of each period which implies transfer costs. To keep the example simple we assume that the random changes in the cash flow are given by independent and identically distributed random variables (ZnZ_n) with finite expectation. The transfer cost are linear. More precisely, let us define a function c:RR+c:\mathbb{R}\rightarrow\mathbb{R}_+ by
c(z)=cuz++cdzc(z)=c_uz^++c_dz^-
where cu,cd>0c_u,c_d > 0. The transfer cost are then given by c(z)c(z) if the amount zz is transferred. The cost L(x)L(x) have to be paid at the beginning of a period for cash level xx. We assume that

Problem Formulation


Solution


Consumption and Investment Problems

Consumption and Investment Problems

We consider now the following extension of the consumption problem of Example 2.1.4. Our investor has an initial wealth x>0x> 0 and at the beginning of each of NN periods she can decide how much of the wealth she consumes and how much she invests into the financial market given as in Section 4.2. In particular Fn=FnS\mathcal{F}_n=\mathcal{F}^S_n. The amount cnc_n which is consumed at time nn is evaluated by a utility function Uc(cn)U_c(c_n). The remaining wealth is invested in the risky assets and the riskless bond, and the terminal wealth XNX_N yields another utility Up(XN)U_p(X_N). How should the agent consume and invest in order to maximize the sum of her expected utilities?


Formulation


Solution


Terminal Wealth Problems

Terminal Wealth Problems

Suppose we have an investor with utility function U:dom URU : dom\ U \rightarrow \mathbb{R} with dom U=[0,)dom\ U =[0,\infty) or dom U=(0,)dom\ U =(0,\infty) and initial wealth x>0x>0. A financial market with dd risky assets and one riskless bond is given (for a detailed description see Section 3.1). Here we assume that the random vectors R1,,RNR_1,\dots,R_N are independent but not necessarily identically distributed. Moreover we assume that (Fn)(\mathcal{F}_n) is the filtration generated by the stock prices, i.e. Fn=FnS\mathcal{F}_n = \mathcal{F}^S_n. We make the (FM) assumption for the financial market.

Our agent has to invest all the money into this market and is allowed to rearrange her portfolio over NN stages. The aim is to maximize the expected utility of her terminal wealth.


Formulation


Solution

For the multiperiod terminal wealth problem it holds:


Portfolio Selection with Transaction Costs

Portfolio Selection with Transaction Costs

We consider now the utility maximization problem of Section 4.2 (Terminal Wealth Problems) under proportional transaction costs. For the sake of simplicity we restrict to one bond and one risky asset. If an additional amount of aa (positive or negative) is invested in the stock, then proportional transaction costs of cac|a| are incurred which are paid from the bond position. We assume that 0c<10\leq c< 1. In order to compute the transaction costs, not only is the total wealth interesting, but also the allocation between stock and bond matters. Thus, in contrast to the portfolio optimization problems so far, the state space of the Markov Decision Model is two-dimensional and consists of the amounts held in the bond and in the stock.


Formulation


Dynamic Mean-Variance Problems

Dynamic Mean-Variance Problems

An alternative approach towards finding an optimal investment strategy was introduced by Markowitz in 1952 and indeed a little bit earlier by de Finetti. In contrast to utility functions the idea is now to measure the risk by the portfolio variance and incorporate this measure as follows: Among all portfolios which yield at least a certain expected return (benchmark), choose the one with smallest portfolio variance. The single-period problem was solved in the 1950s. It still has great importance in real-life applications and is widely applied in risk management departments of banks. The problem of multiperiod portfolio-selection was proposed in the late 1960s and early 1970s and has been solved recently. The difficulty here is that the original formulation of the problem involves a side constraint. However, this problem can be transformed into one without constraint by the Lagrange multiplier technique. Then we solve this stochastic Lagrange problem by a suitable Markov Decision Model.
We use the same non-stationary financial market as in Section 4.2 (Terminal Wealth Problems) with independent relative risk variables. Our investor has initial wealth x0>0x_0 > 0. This wealth can be invested into dd risky assets and one riskless bond. How should the agent invest over NN periods in order to find a portfolio with minimal variance which yields at least an expected return of μ\mu?


Formulation


Index Tracking

Index Tracking

The problem of index-tracking which is formulated below can be seen as an application of mean-variance hedging in an incomplete market. Suppose we have a financial market with one bond and d risky assets as in Section 3.1. Besides the tradeable assets there is a non-tradable asset whose price process
(S~n)(\tilde{S}_n) evolves according to
S~n+1=S~nR~n+1\tilde{S}_{n+1} =\tilde{S}_n\tilde{R}_{n+1}.
The positive random variable R~n+1\tilde{R}_{n+1} which is the relative price change of the non-traded asset may be correlated with Rn+1R_{n+1}. It is assumed that the random vectors (R1,R~1),(R2,R~2),(R_1,\tilde{R}_1),(R_2,\tilde{R}_2),\dots are independent and the joint distribution of (Rn,R~n)(R_n,\tilde{R}_n) is given. The aim now is to track the non-traded asset as closely as possible by investing into the financial market. The tracking error is measured in terms of the quadratic distance of the portfolio wealth to the price process (S~n)(\tilde{S}_n), i.e. the optimization problem is then
{ExS~[n=0N(XnϕS~n)2]minϕ=(ϕn) is a portfolio strategy\left\{\begin{array}{l} \mathbb{E}_{x\tilde{S}}\left[\sum_{n=0}^N\left(X^{\phi}_n-\tilde{S}_n\right)^2\right]\rightarrow\min\\ \phi=(\phi_n)\text{ is a portfolio strategy} \end{array}\right.
where ϕn\phi_n is Fn=σ(R1,,Rn,R~1,,R~n)\mathcal{F}_n = \sigma(R_1,\dots,R_n,\tilde{R}_1,\dots,\tilde{R}_n)-measurable.


Formulation