The Markov Decision Process is the sequence of random variables (Xn) which describes the stochastic evolution
of the system states. Of course the distribution of (Xn) depends on the chosen actions.
E denotes the state space of the system. A state x∈E is the information which is available for the controller at time n. Given this information an action has to be selected.
A denotes the action space. Given a specific state x∈E at time n, a certain subclass Dn(x)⊂A of actions may only be admissible.
Qn(B∣x,a) is a stochastic transition kernel which gives the probability that the next state at time n+1 is in the set B if the current state is x and action a is taken at time n.
rn(x,a) gives the (discounted) one-stage reward of the system at time n if the current state is x and action a is taken
gN(x) gives the (discounted) terminal reward of the system at the end of the planning horizon.
A control π is a sequence of decision rules (fn) with fn:E→A where fn(x)∈Dn(x) determines for each possible state x∈E the next action fn(x) at time n. Such a sequence π=(fn) is called policy or strategy. Formally the Markov Decision Problem is given by
complete state observation vs. partial state observation
problems with constraints vs. without constraints
total (discounted) cost criterion vs. average cost criterion
Research questions:
Does an optimal policy exist?
Has it a particular form?
Can an optimal policy be computed efficiently?
Is it possible to derive properties of the optimal value function analytically?
Applications: Consumption Problem
Consumption Problem
Suppose there is an investor with given initial capital. At the beginning of each of N periods she can decide how much of the capital she consumes and how much she invests into a risky asset. The amount she consumes is evaluated by a utility function U as well as the terminal wealth. The remaining capital is invested into a risky asset where we assume that the investor is small and thus not able to influence the asset price and the asset is liquid. How should she consume/invest in order to maximize the sum of her expected discounted utility?
Cash Balance or Inventory Problem
Imagine a company which tries to find the optimal level of cash over a finite number of N periods. We assume that there is a random stochastic change in the cash reserve each period (due to withdrawal or earnings). Since the firm does not earn interest from the cash position, there are holding cost for the cash reserve if it is positive, but also interest (cost) in case it is negative. The cash reserve can be increased or decreased by the management at each decision epoch which implies transfer costs. What is the optimal cash balance policy?
Mean-Variance Problem
Consider a small investor who acts on a given financial market. Her aim is to choose among all portfolios which yield at least a certain expected return (benchmark) after N periods, the one with smallest portfolio variance. What is the optimal investment strategy?
Dividend Problem in Risk Theory
Imagine we consider the risk reserve of an insurance company which earns some premia on the one hand but has to pay out possible claims on the other hand. At the beginning of each period the insurer can decide upon paying a dividend. A dividend can only be paid when the risk reserve at that time point is positive. Once the risk reserve got negative we say that the company is ruined and has to stop its business. Which dividend pay-out policy maximizes the expected discounted dividends until ruin?
Bandit Problem
Suppose we have two slot machines with unknown success probability θ1 and θ2. At each stage we have to choose one of the arms. We receive one Euro if the arm wins, else no cash flow appears. How should we play in order to maximize our expected total reward over N trials?
Pricing of American Options
In order to find the fair price of an American option and its optimal exercise time, one has to solve an optimal stopping problem. In contrast to a European option the buyer of an American option can choose to exercise any time up to and including the expiration time. Such an optimal stopping problem can be solved in the framework of Markov Decision Processes.
Markov Decision Model with planning horizon N∈N consists of a set of data (E,A,Dn,Qn,rn,gN) with the following meaning for n=0,1,2,…,N−1:
E is the state space, endowed with a σ-algebra E. The elements (states) are denoted by x∈E
A is the action space, endowed with a σ-algebra A. The elements (actions) are denoted by a∈A
Dn⊂E×A is a measurable subset of E×A and denotes the set of possible state-action combinations at time n. We assume that Dn contains the graph of a measurable mapping fn:E→A, i.e. (x,fn(x))∈Dn for all x∈E. For x∈E,the set Dn(x)={a∈A:(x,a)∈Dn} is the set of admissible actions in state x at time n.
Qn is a stochastic transition kernel from Dn to E, i.e. for any fixed pair (x,a)∈Dn, the mapping B↦Qn(B∣x,a) is a probability measure on E and (x,a)↦Qn(B∣x,a) is measurable for all B∈E.The quantity Qn(B∣x,a) gives the probability that the next state at time n+1 is in B if the current state is x and action a is taken at time n. Qn describes the **transition law}.
rn:Dn→R is a measurable function. rn(x,a) gives the (discounted) one-stage reward of the system at time n if the current state is x and action a is taken.
gN:E→R is a measurable mapping. gN(x) gives the (discounted) terminal reward of the system at time N if the state is x.
An equivalent definition of MDP
A Markov Decision Model is equivalently described by the set of data (E,A,Dn,Z,Tn,QnZ,rn,gN) with the following meaning:
E,A,Dn,rn,gN are as in Definition of last slide
Z is the disturbance space, endowed with a σ-algebra Z.
QnZ(B∣x,a) is a stochastic transition kernel for B∈Z and (x,a)∈Dn and QnZ(B∣x,a) denotes the probability that Zn+1 is in B if the current state is x and action a is taken.
Tn:Dn×Z→E is a measurable function and is called transition or system function. Tn(x,a,z) gives the next state of the system at time n+1 if at time n the system is in state x,action a is taken and the disturbance z occurs at time n+1.
Example: Consumption Problem
We denote by Zn+1 the random return of our risky asset over period [n,n+1). Further we suppose that Z1,…,ZN are non-negative, independent random variables and we assume that the consumption is evaluated by utility functions Un:R+→R. The final capital is also evaluated by a utility function UN. Thus we choose the following data:
E=R+ where xn∈E denotes the wealth of the investor at time n
A=R+ where an∈A denotes the wealth which is consumed at time n
Dn(x)=[0,x] for all x∈E, i.e. we are not allowed to borrow money
Z=R+ where z denotes the random return of the asset
Tn(xn,an,zn+1)=(xn−an)zn+1 is the transition function
QnZ(⋅∣x,a)= distribution of Zn+1 (independent of (x,a))
rn(x,a)=Un(a) is the one-stage reward
gN(x)=UN(x)
Decision rule & strategy
A measurable mapping fn:E→A with the property fn(x)∈Dn(x) for all x∈E, is called a decision rule at time n. We denote by Fn the set of all decision rules at time n.
A sequence of decision rules π=(f0,f1,…,fN−1) with fn∈Fn is called an N-stage policy or N-stage strategy.
Value function: Vn(x)=πsupVnπ(x),x∈E.
A theorem: For n=0,…,N it holds: Vn(xn)=π∈ΠNsupVnπ(hn),hn=(x0,a0,x1,…,xn).
Finite Horizon Markov Decision Models
Finite Horizon Markov Decision Models
**Integrability Assumption (AN):} For n=0,1,…,N δnN(x)=πsupEnxπ[k=n∑N−1rk+(Xk,fk(Xk))+gN+(XN)]<∞,x∈E.
Assumption (AN) is assumed to hold for the N-stage Markov Decision Problems.
Example: (Consumption Problem) In the consumption problem Assumption (AN) is satisfied if we assume that the utility functions are increasing and concave and EZn<∞ for all n, because then rn and gN can be bounded by an affine-linear function c1+c2x with c1,c2≥0 and since Xn≤xZ1…Zn a.s. under every policy, the function δnN satisfies δnN(x)=≤supπEnxπ[∑k=nN−1Uk+(fk(Xk))+UN+(XN)]Nc1+xc2∑k=nNEZ1…EZk<∞,x>0
For n=0,1,…,N and a policy π=(f0,…,fN−1) let Vnπ(x) be defined by Vnπ(x)=Enxπ[k=n∑N−1rk(Xk,fk(Xk))+gN(XN)],x∈E. Vnπ(x) is the expected total reward at time n over the remaining stages n to N if we use policy π and start in state x∈E at time n. The value function Vn is defined by Vn(x)=πsupVnπ(x),x∈E. Vn(x) is the maximal expected total reward at time n over the remaining stages n to N if we start in state x∈E at time n. The functions Vnπ and Vn are well-defined since Vnπ(x)≤Vn(x)≤δnN(x)<∞,x∈E.
The Bellman Equation
The Bellman Equation
Let us denote by M(E)={v∈E↦[−∞,∞)∣v is measurable}, we define the following operators:
[ For v∈M(E) define (Lnv)(x,a)=rn(x,a)+∫v(x′)Qn(dx′∣x,a),(x,a)∈Dn
whenever the integral exists.
For v∈M(E) and f∈Fn define (Tnfv)(x)=(Lnv)(x,f(x)),x∈E.
[ For v∈M(E) define (Tnv)(x)=a∈Dn(x)sup(Lnv)(x,a),x∈E. Tn is called the maximal reward operator at time n.
Reward Iteration
Theorem
Let π=(f0,…,fN−1) be an N-stage policy. For n=0,1,…,N−1 it holds:
VNπ=gN and Vnπ=TnfnVn+1,π,
Vnπ=Tnfn…TN−1fN−1gN.
Example: (Consumption Problem) Note that for fn∈Fn the Tnfn operator in this example reads Tnfn=Un(fn(x))+Ev((x−fn(x))Zn+1).
Now let us assume that Un(x)=logx for all n and gN(x)=logx. Moreover, we assume that the return distribution is independent of n and has finite expectation EZ. Then (AN) is satisfied as we have shown before. If we choose the N-stage policy π=(f0,…,fN−1)with fn(x)=cx and c∈[0,1], i.e. we always consume a constant fraction of the wealth, then the Reward Iteration implies by induction on N that V0π(x)=(N+1)logx+Nlogc+2(N+1)N(log(1−c)+ElogZ).
Hence, π∗=(f0∗,…,fN−1∗) with fn∗(x)=c∗x and c∗=N+32 maximizes the expected log-utility (among all linear consumption policies).
Maximizer, the Bellman Equation & Verification Theorem
Definition of a maximizer: Let v∈M(E). A decision f∈Fn is called a maximizer of v at time n if Tnf(v)=Tnv, i.e. for all x∈E, f(x) is a maximum point of the mapping a↦Lnv(x,a), a∈Dn(x).
The Bellman Equation VNVn==gN,TnVn+1,n=1,2,…,N−1.
Verification Theorem: Let (vn)⊂M(E) be a solution of the Bellman equation. Then it holds:
vn≥Vn for n=0,1,…,N.
If fn∗ is a maximizer of vn+1 for n=0,1,…,N−1, then vn=Vn and the policy π∗=(f0∗,f1∗,…,fN−1∗) is optimal for the N-stage Markov Decision Problem.
The Structure Assumption & Structure Theorem
Structure Assumption (SAN): There exists sets Mn⊂M(E) and Δn⊂Fn such that for all n=0,1,…,N−1:
gN∈MN
If v∈Mn+1 then Tnv is well-defined and Tnv∈Mn
For all v∈Mn+1 there exists a maximizer fn of v with fn∈Δn
Structure Theorem: Let (SAN) be satisfied. Then it holds:
Vn∈Mn and the sequence (Vn) satisfies the Bellman equation, i.e. for n=0,1,,…,N−1 VN(x)Vn(x)==gN(x),supa∈Dn(x){rn(x,a)+∫Vn+1(x′)Qn(dx′∣x,a)}.x∈E.
Vn=TnTn+1…TN−1gN
For n=0,1,…,N−1 there exit maximizers fn of Vn+1 with fn∈Δn, and every sequence of maximizers fn∗ of Vn+1 defines an optimal policy (f0∗,f1∗,…,fN−1∗) for the N-stage Markov Decision Problem
A corollary: Let (SAN) be satisfied. If n≤m≤N then it holds: Vn(x)=πsupEnxπ[k=n∑m−1rk(Xk,fk(Xk))+Vm(Xm)],x∈E.
Backward Induction Algorithm
Principle of Dynamic Programming: Let (SAN) be satisfied, Then it holds for n≤m≤N: Vnπ∗(x)=Vn(x)⟹Vmπ∗=VmPnxπ∗−a.s.,
i.e. if (fn∗,…,fN−1∗) is optimal for the time period [n,N] then (fm∗,…,fN−1∗) is optimal for [m,N].
The Financial Markets
Asset Dynamics and Portfolio Strategies
Asset Dynamics and Portfolio Strategies
We assume that asset prices are monitored in discrete time
time is divided into periods of length Δt and tn=nΔt
multiplicative model for asset prices: Sn+1=SnR~n+1
The binomial model(\textit{Cox-Ross-Rubinstein model}) and discretization of the Black-Scholes-Merton model are two important special cases of the multiplicative model
In what follows we will consider an N-period financial market with d risky assets and one riskless bond. We assume that all random variables are defined on a probability space (Ω,F,P) with filtration (Fn) and F0={ϕ,Ω}.The financial market is given by:
A riskless bond with S00≡1 and Sn+10=Sn0(1+in+1),n=0,1,…,N−1
where in+1 denotes the deterministic interest rate for the time period [n,n+1). If the interest rate is constant, i.e. in≡i,then Sn0=(1+i)n.
There are d risky assets and the price process of asset k is given by S0k=s0k and Sn+1k=SnkR~n+1k,n=0,1,…,N−1.
The processes (Snk) are assumed to be adapted with respect to the filtration (Fn) for all k. Moreover, we suppose that R~n+1k>0P−a.s. for all k and n and that s0k is deterministic. R~n+1k is the relative price change in the time interval [n,n+1) for the risky asset k.
A portfolio or a trading strategy is an (Fn)-adapted stochastic process ϕ=(ϕn0,ϕn) where ϕn0∈R and ϕn=(ϕn1,…,ϕnd)∈Rd for n=0,1,…,N−1. The quantity ϕnk denotes the amount of money which is invested into asset k during the time interval [n,n+1).
The vector (ϕn0,ϕn) is called the initial portfolio of the investor. The value of the initial portfolio is given by X0=k=0∑dϕ0k=ϕ00+ϕ0⋅e
%
where x⋅y=∑k=1dxkyk denotes the inner product of the vectors x,y∈Rd and e=(1,…,1)∈Rd.
Let ϕ be a portfolio strategy and denote by Xn− the value of the portfolio at time n before trading. Then Xn=Xn−=k=0∑dϕn−1kR~nk=ϕn−10(1+in)+ϕn−1⋅R~n
The value of the portfolio at time n after trading is given by Xn+=k=0∑dϕnk=ϕn0+ϕn⋅e
**Self-financing:} A portfolio strategy ϕ is called self-financing if Xn−ϕ=Xn+ϕP−a.s.
for all n=1,…,N−1, i.e. the current wealth is just reassigned to the assets.
Arbitrage opportunity: An arbitrage opportunity is a self-financing portfolio strategy ϕ=(ϕn0,ϕn) with the following property: X0ϕ=0 and P(XNϕ≥0)=1 and P(XNϕ>0)>0.
A theorem: Consider an N-period financial market. The following two statements are equivalent:
There are no arbitrage opportunities.
For n=0,1,…,N−1 and for all Fn-measurable ϕn∈Rd it holds: ϕn⋅R~n+1≥0P−a.s.⟹ϕn⋅R~n+1=0P−a.s.
Modeling and Solution Approaches with MDP
Modeling and Solution Approaches with MDP
Modeling approach: specify the elements of the MDP
state space E, action space A
transition function: xn+1=Tn(xn,an,Zn)
value function Vn(xn)=supan,…,aN−1{∑k=nN−1gk(xk)+gN(xN)}
the Bellman equation: Vn(xn)=supan{gn(xn,an)+Vn+1(xn,an,Zn)}
Solution approaches
backward induction (with the Bellman equation)
the existence and form of optimal policy
we are interested in the structured properties that are preserved in the iterations
MDP Applications in Finance
A Cash Balance Problem
A Cash Balance Problem
The cash balance problem involves the decision about the optimal cash level of a firm over a finite number of periods. The aim is to use the firm’s liquid assets efficiently. There is a random stochastic change in the cash reserve each period (which can be both positive and negative). Since the firm does not earn interest from the cash position, there are holding cost or \textit{opportunity cost} for the cash reserve if it is positive. But also in case the cash reserve is negative the firm incurs an out-of-pocket expense and has to pay interest. The cash reserve can be increased or decreased by the management at the beginning of each period which implies transfer costs. To keep the example simple we assume that the random changes in the cash flow are given by independent and identically distributed random variables (Zn) with finite expectation. The transfer cost are linear. More precisely, let us define a function c:R→R+ by c(z)=cuz++cdz−
where cu,cd>0. The transfer cost are then given by c(z) if the amount z is transferred. The cost L(x) have to be paid at the beginning of a period for cash level x. We assume that
L:R→R+, L(0)=0,
L↦L(x) is convex
lim∣x∣→∞∣x∣L(x)=∞
Problem Formulation
Elements of MDP
E=R where x∈E denotes the cash level,
A=R where a∈A denotes the new cash level after transfer,
D(x)=A,
Z=R where z∈Z denotes the cash change,
QZ(⋅∣x,a)=distribution of Zn+1 (independent of (x,a)),
r(x,a)=−c(a−x)−L(a),
g≡0,
β∈(0,1]
The state transition function: xk+1=T(xk,ak,zk)=ak−zk, for k=1,…,N
The value function vn(xn)={ai∈R:i=1,…,N}supEZ[k=n∑N{[−c(an−xn)−L(an)]+g}], for k=1,…,N
The Bellman equation Vn(xn)=an∈RsupEZ[−c(an−xn)−L(an)+Vn+1(an−Zn)], for k=1,…,N
Solution
Solution is worked through by backward induction
We verified the validity of Integrability Assumption (AN) and the relative Structure Assumption (SAN) for each Vn(xn)
The solution to the cash balance problem (Theorem 2.6.2):
There exist critical levels Sn− and Sn+ such that for n=1,…,N Jn(x)=⎩⎨⎧(Sn−−x)cu+L(Sn−)+βEJn−1(Sn−−Z)L(x)+βEJn−1(x−Z)(x−Sn+)cd+L(Sn+)+βEJn−1(Sn+−Z) if x<Sn− if Sn−≤x≤Sn+ if x>Sn+
with J0≡0.
There exist critical levels Sn− and Sn+ such that for n=1,…,N fn∗(x)=⎩⎨⎧Sn−xSn+ if x<Sn− if Sn−≤x≤Sn+ if x>Sn+
Consumption and Investment Problems
Consumption and Investment Problems
We consider now the following extension of the consumption problem of Example 2.1.4. Our investor has an initial wealth x>0 and at the beginning of each of N periods she can decide how much of the wealth she consumes and how much she invests into the financial market given as in Section 4.2. In particular Fn=FnS. The amount cn which is consumed at time n is evaluated by a utility function Uc(cn). The remaining wealth is invested in the risky assets and the riskless bond, and the terminal wealth XN yields another utility Up(XN). How should the agent consume and invest in order to maximize the sum of her expected utilities?
Formulation
Assumption (FM)
There is no arbitrage opportunity.
E∥Rn∥<∞ for all n=1,…,N.
As in Section 4.2 we impose the Assumption (FM) on the financial market. Moreover, we assume that the utility functions Uc and Up satisfy domUc=domUp=[0,∞). Analogously to (3.1) the wealth process (Xn)evolves as follows Xn=(1+in)(Xn−cn+ϕn⋅Rn)
where (c,ϕ)=(cn,ϕn) is a consumption-investment strategy, i.e. (ϕn)and (cn)are Fn-adapted and 0≤cn≤Xn. The consumption-investment problem is then given by {Ex[∑=0N−1Uc(cn)+Up(XNc,ϕ)]→max(c,ϕ) is a consumption-investment strategy with XNc,ϕ∈domUpP=a.s.
Elements of MDP
E=[0,∞) where x∈E denotes the wealth,
A=R×Rd where a∈Rd is the amount of money invested in the risky assets and c∈R the amount which is consumed,
Dn(x) is given by Dn(x)={(c,a)∈A:0≤c≤x and (1+in+1)(x−c+a⋅Rn+1)∈EP−a.s.}
X=[−1,∞]d where z∈Z denotes the relative risk
Tn(x,c,a,z)=(1+in+1)(x−c+a⋅z)
QnZ(⋅∣x,c,a)=distribution of Rn+1 (independent of (x,c,a)),
rn(x,c,a)=Uc(c)
gN(x)=Up(x).
The value function Vn(x)=πsupEnxπ[k=n∑N−1Uc(ck(Xk))+Up(XN)]
Solution
Let Uc and Up be utility functions with domUc=domUp=[0,∞). Then it holds:
There are no arbitrage opportunities if and only if there exists a measurable function f∗:domUp→A such that u(x,f∗(x))=v(x,x∈domUp)
v(x) is strictly increasing, strictly concave and continuous on domUp.
For the multiperiod consumption-investment problem it holds:
The value functions Vn are strictly increasing, strictly concave and continuous.
The value functions can be computed recursively by the Bellman equation VN(x)=Up(x),Vn(x)=sup(c,a)∈Dn(x){Uc(c)+EVn+1((1+in+1)(x−c+a⋅Rn+1))}
There exist maximizers fn∗ of Vn and the strategy (f0∗,f1∗,…,fN−1∗) is optimal for the N-stage consumption-investment problem.
Terminal Wealth Problems
Terminal Wealth Problems
Suppose we have an investor with utility function U:domU→R with domU=[0,∞) or domU=(0,∞) and initial wealth x>0. A financial market with d risky assets and one riskless bond is given (for a detailed description see Section 3.1). Here we assume that the random vectors R1,…,RN are independent but not necessarily identically distributed. Moreover we assume that (Fn) is the filtration generated by the stock prices, i.e. Fn=FnS. We make the (FM) assumption for the financial market.
Our agent has to invest all the money into this market and is allowed to rearrange her portfolio over N stages. The aim is to maximize the expected utility of her terminal wealth.
Formulation
The wealth process (Xn) evolves as follows Xn+1=(1+in+1)(Xn+ϕn⋅Rn+1)
where ϕ=(ϕn) is a portfolio strategy. The optimization problem is then {ExU(XNϕ)→maxϕ is a portfolio strategy and XNϕ∈domUP−a.s.
Elements of the MDP
E=domU where x denotes the wealth,
A=Rd where a is the amount of money invested in the risky assets,
Dn(x)={a∈Rd:(1+in+1)(x+a⋅Rn+1)∈domUP−a.s.},
Z=[−1,∞)d where z denotes the relative risk,
Tn(x,a,z)=(1+in+1)(x+a⋅z),
QnZ(⋅∣x,a)=distribution of Rn (independent of (x,a)),
rn≡0, and gN(x)=U(x).
Solution
For the multiperiod terminal wealth problem it holds:
The value functions Vn are strictly increasing, strictly concave and continuous.
The value functions can be computed recursively by the Bellman equation VN(x)=U(x),Vn(x)=supa∈Dn(x)EVn+1((1+in+1)(x+a⋅Rn+1)),x∈E
There exist maximizers fn∗ of Vn+1 and the strategy (f0∗,f1∗,…,fN−1∗) is optimal for the N-stage terminal wealth problem.
Portfolio Selection with Transaction Costs
Portfolio Selection with Transaction Costs
We consider now the utility maximization problem of Section 4.2 (Terminal Wealth Problems) under proportional transaction costs. For the sake of simplicity we restrict to one bond and one risky asset. If an additional amount of a (positive or negative) is invested in the stock, then proportional transaction costs of c∣a∣ are incurred which are paid from the bond position. We assume that 0≤c<1. In order to compute the transaction costs, not only is the total wealth interesting, but also the allocation between stock and bond matters. Thus, in contrast to the portfolio optimization problems so far, the state space of the Markov Decision Model is two-dimensional and consists of the amounts held in the bond and in the stock.
Formulation
Elements of the MDP
E=E+2 where x=(x0,x1)∈E denotes the amount invested in bond and stock,
A=R2 where a=(a0,a1)∈A denotes the amount invested in bond and stock after transaction,
Z=R+ where z∈Z denotes the relative price change of the stock,
Tn(x,(a0,a1),zn+1)=(a0(1+in+1),a1zn+1),
QnZ(⋅∣x,a0,a1)=distribution of R~n+1 (independent of (x,a0,a1)),
rn≡0,
gN(x)=U(x0+x1),x=(x0,x1)∈E.
Dynamic Mean-Variance Problems
Dynamic Mean-Variance Problems
An alternative approach towards finding an optimal investment strategy was introduced by Markowitz in 1952 and indeed a little bit earlier by de Finetti. In contrast to utility functions the idea is now to measure the risk by the portfolio variance and incorporate this measure as follows: Among all portfolios which yield at least a certain expected return (benchmark), choose the one with smallest portfolio variance. The single-period problem was solved in the 1950s. It still has great importance in real-life applications and is widely applied in risk management departments of banks. The problem of multiperiod portfolio-selection was proposed in the late 1960s and early 1970s and has been solved recently. The difficulty here is that the original formulation of the problem involves a side constraint. However, this problem can be transformed into one without constraint by the Lagrange multiplier technique. Then we solve this stochastic Lagrange problem by a suitable Markov Decision Model.
We use the same non-stationary financial market as in Section 4.2 (Terminal Wealth Problems) with independent relative risk variables. Our investor has initial wealth x0>0. This wealth can be invested into d risky assets and one riskless bond. How should the agent invest over N periods in order to find a portfolio with minimal variance which yields at least an expected return of μ?
Formulation
Elements of the MDP
E=R where x∈E denotes the wealth,
A=Rd where a∈A is the amount of money invested in the risky assets,
Dn(x)=A,
Z=[−1,∞)d where z∈Z denotes the relative risk,
Tn(x,a,z)=(1+in)(x+a⋅z),
QnZ(⋅∣x,a)= distribution of Rn+1 (independent of (x,a)).
The original formulation (MV) (MV)⎩⎨⎧Varx0π[XN]→minEx0π[XN]≥μπ∈FN
An equivalent formulation (MV=) (MV=)⎩⎨⎧Varx0π[XN]→minEx0π[XN]≥μπ∈FN
Assumption (FM):
E∥Rn∥<∞ and ERn=0 for all n=1,…,N.
The covariance matrix of the relative risk process (Cov(Rnj,Rnk))1≤j,k≤d
is positive definite for all n=1,…,N.
x0SN0<μ.
Problem (MV) can be solved via the well-known Lagrange multiplier tecnique. Let Lx0(π,λ) be the Lagrange-function, i.e. Lx0(π,λ)=Varx0π[XN]+2λ(μ−Ex0π[XN]) for π∈FN,λ≥0.
The Lagrange-problem for the parameter λ>0 P(λ){Lx0(π,λ)→minπ∈FN
A stochastic LQ-problem QP(b){Ex0π[(XN−b)2]→minπ∈FN
If π∗ is optimal for P(λ),then π∗ is optimal for QP(b) with b=Ex0π∗[XN]+λ.
Elements od MDP
rn(x,a)=0,
gN(x)=−(x−b)2
Solution
Dynamic Mean-Risk Problems
Index Tracking
Index Tracking
The problem of index-tracking which is formulated below can be seen as an application of mean-variance hedging in an incomplete market. Suppose we have a financial market with one bond and d risky assets as in Section 3.1. Besides the tradeable assets there is a non-tradable asset whose price process (S~n) evolves according to S~n+1=S~nR~n+1.
The positive random variable R~n+1 which is the relative price change of the non-traded asset may be correlated with Rn+1. It is assumed that the random vectors (R1,R~1),(R2,R~2),… are independent and the joint distribution of (Rn,R~n) is given. The aim now is to track the non-traded asset as closely as possible by investing into the financial market. The tracking error is measured in terms of the quadratic distance of the portfolio wealth to the price process (S~n), i.e. the optimization problem is then ⎩⎨⎧ExS~[∑n=0N(Xnϕ−S~n)2]→minϕ=(ϕn) is a portfolio strategy
where ϕn is Fn=σ(R1,…,Rn,R~1,…,R~n)-measurable.
Formulation
Elements of the MDP
E=R×R where (x,s~)∈E and x is the wealth and s~ the value of the non-traded asset,
A=Rd where a∈A is the amount of money invested in the risky assets,
D(x,s~)=A,
Z=(−1,∞)d×R+ where z=(z1,z2)∈Z and z1 is the relative risk of the traded assets and z2 is the relative price change of the non-traded asset.
The transition function is given by Tn((x,s~),a,(z1,z2))=(1+in+100z2)(xs~)+((1+in+1)z1T0)a
QnZ(⋅∣(x,s~,a)= joint distribution of (Rn+1,R~n+1) (independent of ((x,s~),a)),
rn((x,s~),a)=−(x−s~)
g(x,s~)=−(x−s~).
Value function (cost-to-go) Vn(x,s~)=πinfEnxs~π[k=n∑N(Xk−S~k)2]