Representation learning: automatically extract latent factors or features (e.g., nonlinear risk factors) from raw inputs.
Scalability to high dimension: many parameters but trained with stochastic optimization and regularization.
Inductive biases:
| MLP | CNN | RNN | GNN |
|---|---|---|---|
| flexible for general tabular data. | local patterns and translation invariance (useful for structured signals like term structures or limit order books). | sequential dependence in returns, volatility, or flows. | relationships on graphs (counterparty, supply-chain, ownership). |
|
|
Example models
|
MLPs can be used to perform classification and regression for many kinds of data. We give some examples below. Try it for yourself via: https://playground.tensorflow.org |
|
|
Pros vs Cons
| Aspect | Pros | Cons / Risks |
|---|---|---|
| Flexibility | Universal approximator, rich nonlinearities | Easy to overfit with small samples |
| Data format | Works well on tabular, cross-sectional data | No built-in inductive bias for sequence/graph |
| Optimization | SGD-based training scales to large datasets | Nonconvex; local minima, saddle points |
| Interpret. | Can embed economic constraints via architecture | Harder to explain than linear / trees |
Pros vs Cons
| Aspect | Pros | Cons / Risks |
|---|---|---|
| Inductive | Captures local patterns, translation invariance | Less suitable if no local structure |
| Efficiency | Fewer parameters than dense layers | Architecture choices can be ad-hoc |
| Data types | Works well on sequences and grids | May need many filters/levels |
| Interpret. | Filters sometimes interpretable as “motifs” | Still less transparent than linear |
CNNs introduce local receptive fields and parameter sharing, making them efficient for data with spatial or temporal structure.
1D convolutions are natural for time-series–like financial inputs, where nearby time points or maturities are strongly related.
A typical architecture stacks multiple convolutional layers and then uses fully connected layers for final predictions.
CNNs can be viewed as learned filters that detect recurring motifs in financial signals.
Limitations arise when the problem does not exhibit clear local patterns, or when long-range dependencies dominate, which motivates RNNs and attention.
More broadly, CNNs illustrate how network structure shapes what is easy to learn: with convolutional structure, the model is biased toward learning local, translation-invariant features.
LSTM cell (core idea; biases omitted for brevity):
GRU cell (simplified):
Gates decide what to forget, what to update, and what to output, helping with long-term dependencies.
Intuition
Pros vs Cons
| Aspect | Pros | Cons / Risks |
|---|---|---|
| Sequence | Natural for time series and sequences | Hard to parallelize across time |
| Memory | Can capture medium/long-term dependencies | Still struggles with very long range |
| Flexibility | Many variants (stacked, bidirectional) | Many hyperparameters, tuning heavy |
Finance uses:
Forecasting returns, volatility, or risk measures based on past time series.
Modeling order book dynamics and execution cost profiles.
Multi-step forecasting of macro-financial variables.
RNNs are designed for sequence data, with a hidden state that evolves over time as new inputs arrive.
Vanilla RNNs are conceptually simple but face optimization issues for long sequences.
LSTM and GRU introduce gating mechanisms to help retain or forget information, improving the modeling of medium to long-term dependencies.
In finance, RNN-based architectures are natural choices whenever temporal dynamics and history dependence are central.
Compared with CNNs, RNNs emphasize ordered dependence over time rather than local patterns in a fixed grid; again, the network structure encodes what kind of regularity is easiest to learn.
Pros vs Cons
| Aspect | Pros | Cons / Risks |
|---|---|---|
| Long-range | Better handles long-range dependencies | Adds complexity and parameters |
| Interpret. | Weights |
Not always truly causal/explanatory |
| Flexibility | Works with RNN encoders/decoders, sets, etc. | Still sequence-length dependent |
Attention mechanisms augment sequence models (typically RNNs) with the ability to selectively focus on different parts of the input.
Technically, attention computes similarity scores between a “query” state and a set of “key–value” states, turning them into weights used to form a context vector.
Pre-Transformer attention arose in encoder–decoder architectures and remains a useful conceptual tool for designing finance models that handle long and complex sequences.
In later lectures (on big data and large language models), we will revisit attention in the context of Transformers; here we focus on its core idea and pre-Transformer form.
|
|
VAEs posit a latent-variable model
Training maximizes a variational lower bound (ELBO) that balances:
The reparameterization trick enables gradient-based optimization.
The encoder computes a low-dimensional latent representation of financial observations (e.g., yield curves, market states, portfolios), which can be used even without sampling from the decoder.
Thus, VAEs provide both a generative model for scenarios and a tool for nonlinear dimension reduction and representation learning.
|
|
GANs set up a two-player game between:
The minimax objective encourages the generator to match the real data distribution and the discriminator to become a powerful classifier.
Practical training uses variants (e.g., Wasserstein GAN) to improve stability and reduce mode collapse.
In finance, GANs can generate realistic joint scenarios of returns, volatilities, or yield curves for stress testing and data augmentation.
The discriminator also acts as a representation learner: its internal layers learn features that distinguish typical from atypical patterns in markets, which can sometimes be reused for risk monitoring or anomaly detection.
Pros vs Cons
| Model | Pros | Cons / Risks |
|---|---|---|
| VAE | Probabilistic, explicit latent structure | Reconstructions may be too “smooth” |
| GAN | Sharp, realistic samples | Training instability, mode collapse |
VAEs and GANs are central deep generative models that learn to approximate complex data distributions.
VAEs use a probabilistic latent-variable framework and optimize an ELBO via an encoder and decoder.
GANs frame generation as a two-player game between a generator and a discriminator.
In finance, they are primarily useful for scenario generation, stress testing, and data augmentation, not as direct replacements for structural models or risk factor frameworks.
Key Results / Insights
Why suitable for deep generative models
Problem
Model / Algorithm
Key Results
Why this is a natural deep/RL problem
Intuition
Pros vs Cons
| Aspect | Pros | Cons / Risks |
|---|---|---|
| Structure | Respects network topology | Needs graph data and quality edges |
| Flexibility | Learns complex neighborhood interactions | Over-smoothing for many layers |
| Finance | Natural for systemic risk, contagion, spillover | Interpretability can be challenging |
Finance uses:
Systemic risk: predict default probabilities or losses using interbank networks.
Credit risk: incorporate supply-chain or ownership networks.
Market structure: model spillovers among firms or sectors linked by customer–supplier relationships.
GNNs generalize neural networks to graph-structured data via message passing and aggregation over neighbors.
They learn representations of nodes, edges, or entire graphs that implicitly capture network structure and interactions.
In finance, GNNs are promising tools for modeling interconnected systems, such as banking networks, ownership structures, and supply-chain relationships.
Compared with MLPs on tabular data, GNNs encode a strong prior that outcomes depend critically on relations rather than only on standalone characteristics.
Again, network architecture shapes what is easy to learn: with a graph structure, the model is biased toward capturing relational patterns and contagion effects.
如果本讲slides用到mermaid charts,slides末尾应有以下内容