L05d Sequence Modeling

Materials are adopted from "Dixon, Matthew F., Igor Halperin, and Paul Bilokon. Machine learning in Finance. Springer International Publishing, 2020.". This handout is only for teaching. DO NOT DISTRIBUTE.

Chapter Objectives
By the end of this chapter, the reader should expect to accomplish the following:

Hidden Markov Modeling

The Bayesian network representing the conditional dependence relations between the observed and the hidden variables in the HMM. The conditional dependence relationships define the edges of a graph between parent nodes, YtY_{t}, and child nodes StS_{t}.


p(s,y)=p(s1)p(y1s1)t=2Tp(stst1)p(ytst).p(\mathbf{s}, \mathbf{y})=p\left(s_{1}\right) p\left(y_{1} \mid s_{1}\right) \prod_{t=2}^{T} p\left(s_{t} \mid s_{t-1}\right) p\left(y_{t} \mid s_{t}\right) .

Example 7.1 Bull or Bear Market?
Suppose that the market is either in a Bear or Bull market regime represented by s=0s=0 or s=1s=1, respectively. Such states or regimes are assumed to be hidden. Over each period, the market is observed to go up or down and represented by y=1y=-1 or y=1y=1. Assume that the emission probability matrix-the conditional dependency matrix between observed and hidden variables is independent of time and given by
P(yt=yst=s)=[y/s0110.80.210.20.8]\mathbb{P}\left(y_{t}=y \mid s_{t}=s\right)=\left[\begin{array}{ccc} y / s & 0 & 1 \\ -1 & 0.8 & 0.2 \\ 1 & 0.2 & 0.8 \end{array}\right]
and the transition probability density matrix for the Markov process {St}\left\{S_{t}\right\} is given by
A=[0.90.10.10.9],[A]ij:=P(St=siSt1=sj)A=\left[\begin{array}{lll} 0.9 & 0.1 \\ 0.1 & 0.9 \end{array}\right],[A]_{i j}:=\mathbb{P}\left(S_{t}=s_{i} \mid S_{t-1}=s_{j}\right)
Given the observed sequence {1,1,1}\{-1,1,1\} (i.e., T=3T=3 ), we can compute the probability of a realization of the hidden state sequence {1,0,0}\{1,0,0\} using Eq. 7.1. Assuming that P(s1=0)=P(s1=1)=12\mathbb{P}\left(s_{1}=0\right)=\mathbb{P}\left(s_{1}=1\right)=\frac{1}{2}, the computation is
P(s,y)=P(s1=1)P(y1=1s1=1)P(s2=0s1=1)P(y2=1s2=0)P(s3=0s2=0)P(y3=1s3=0)=0.50.20.10.20.90.2=0.00036\begin{aligned} \mathbb{P}(\mathbf{s}, \mathbf{y})=& \mathbb{P}\left(s_{1}=1\right) \mathbb{P}\left(y_{1}=-1 \mid s_{1}=1\right) \mathbb{P}\left(s_{2}=0 \mid s_{1}=1\right) \mathbb{P}\left(y_{2}=1 \mid s_{2}=0\right) \\ & \mathbb{P}\left(s_{3}=0 \mid s_{2}=0\right) \mathbb{P}\left(y_{3}=1 \mid s_{3}=0\right) \\ =& 0.5 \cdot 0.2 \cdot 0.1 \cdot 0.2 \cdot 0.9 \cdot 0.2=0.00036 \end{aligned}

The Viterbi Algorithm

Estimate the most likely sequence realization using the Viterbi algorithm.

Example 7.2 The Crooked Dealer
A dealer has two coins, a fair coin, with P\mathbb{P} (Heads) =12=\frac{1}{2}, and a loaded coin, with P\mathbb{P} (Heads) =45=\frac{4}{5}. The dealer starts with the fair coin with probability 35\frac{3}{5}. The dealer then tosses the coin several times. After each toss, there is a 25\frac{2}{5} probability of a switch to the other coin. The observed sequence is Heads, Tails, Heads, Tails, Heads, Heads, Heads, Tails, Heads.

In this case, the state space and observation space are, respectively,
S={1= Fair, 2= Loaded },O={o1= Heads, o2= Tails }\mathcal{S}=\left\{\int_{1}=\text { Fair, } \int_{2}=\text { Loaded }\right\}, \quad O=\left\{o_{1}=\text { Heads, } o_{2}=\text { Tails }\right\}
with initial probabilities
π={π1=0.6,π2=0.4}\pi=\left\{\pi_{1}=0.6, \pi_{2}=0.4\right\}
transition probabilities
A=(0.60.40.40.6)A=\left(\begin{array}{ll} 0.6 & 0.4 \\ 0.4 & 0.6 \end{array}\right)
and the emission matrix is
B=(0.50.50.80.2)B=\left(\begin{array}{ll} 0.5 & 0.5 \\ 0.8 & 0.2 \end{array}\right)
Given the sequence of observations
y=(Heads, Tails, Heads, Tails, Heads, Heads, Heads, Tails, Heads)\mathbf{y}=(\text{Heads, Tails, Heads, Tails, Heads, Heads, Heads, Tails, Heads}) ,

we would like to find the most likely sequence of hidden states, s=s= {s1,,sT}\left\{s_{1}, \ldots, s_{T}\right\}, i.e., determine which of the two coins generated which of the coin tosses.

the Viterbi algorithm

State-Space Models

Particle Filtering

Sequential Importance Resampling (SIR)

Multinomial Resampling

Notice, fromabove, that we are using with the normalized weights computed before resampling, brλt(1),,brλt(M){ }^{\mathrm{br}} \lambda_{t}^{(1)}, \ldots,{ }^{\mathrm{br}} \lambda_{t}^{(M)} :

so that, by construction, brΛt(M)=1{ }^{\mathrm{br}} \Lambda_{t}^{(M)}=1.

Thus we end up with MM new particles (children), xtt(1),,xtt(M)\mathbf{x}_{t| t}^{(1)}, \ldots, \mathbf{x}_{t \mid t}^{(M)} sampled from the existing set xtt1(1),,xtt1(M)\mathbf{x}_{t \mid t-1}^{(1)}, \ldots, \mathbf{x}_{t \mid t-1}^{(M)}, so that some of the existing particles may disappear, while others may appear multiple times. For each i=1,,Mi=1, \ldots, M the number of times xtt1(i)\mathbf{x}_{t \mid t-1}^{(i)} appears in the resampled set of particles is known as the particle's replication factor, Nt(i)N_{t}^{(i)}.

We set the normalized weights after resampling: λt(i):=1M\lambda_{t}^{(i)}:=\frac{1}{M}. We could view this algorithm as the sampling of the replication factors Nt(1),Nt(M)N_{t}^{(1)}, \ldots N_{t}^{(M)} from the multinomial distribution with probabilities bλt(1),,hrλt(M){ }^{b} \lambda_{t}^{(1)}, \ldots, \mathrm{hr} \lambda_{t}^{(M)}, respectively. Hence the name of the method.

Summary