| Method | Penalty | Typical Use |
|---|---|---|
| Ridge Regression | Shrinks coefficients, handles multicollinearity | |
| LASSO | Variable selection → sparse model | |
| Elastic Net | Combines both advantages |
|
Empirical Finance Applications
|
Key Takeaway:
“Use Machine Learning for selection, Econometrics for estimation.”
Motivation:
Formulation
where
Key Idea:
|
Empirical Finance Applications
|
Benefits: Higher stability, better economic interpretability, improved prediction under grouped designs.
| Aspect | Ridge | LASSO | Elastic Net |
|---|---|---|---|
| Variable Selection | No | Yes | Group‑wise |
| Handles Collinearity | Strong | May drop one of correlated vars | Good compromise |
| Bias vs Variance | Low variance, high bias | Higher variance, low bias for selected vars |
Balanced |
| Parameter Count | All kept | Few non‑zero | Moderate |
| Interpretation | Shrunk coefficients | Sparse interpretation | Group selection |
| Finance Use Case | Yield curve, risk premia with collinearity |
High‑dim. feature screening | Thematic factor groups |
|
|
Typical Financial Applications
Key Takeaways
|
|
|
|
|
Piecewise Polynomials
Constraints and Splines
|
|
|
|
|
|
|
computing the fit at a target point |
|
|
|
the multiple linear regression model
GAM
Example
|
natural spline |
|
|
smoothing spline |
|
GAMs automatically model non-linear relationships that standard linear regression will miss.
The non-linear fits can potentially make more accurate predictions for the response
We can examine the effect of each
The smoothness of the function
|
|
Key Idea:
Always evaluate on out‑of‑sample or hold‑out set to avoid spurious fit.
Classification as a foundation for binary decision‑making under risk and uncertainty.
Predict discrete label
Examples in finance:
Regression is not appropriate for classification tasks
|
|
source: ISLP
|
source: ISLP |
|
|
prediction
|
|
classify a response variable that has more than two classes
the model
the log odds (for
The bigidea of generative models for classification
Why do we need the generative models for classification
|
|
|
The multivariate Gaussian distribution
|
|
|
|
|
|
Assumption: Within the k-th class, the p predictors are independent
the posterior probability
Estimating the one-dimensional density function
Model the log odds ratio as a generalized additive models:
|
Hyperplane
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Support Vector Machines can not handle nonlinearity.
What can we do?
|
|
| name | function |
|---|---|
| Linear kernel | |
| Polynomial kernel | |
| Radial kernel | |
| Gaussian kernel | |
| Laplacian kernel | |
| Sigmoid kernel |
Suppose
|
|
SVC
|
SVM
|
|
inner products / kernels
|
|
|
|
functional form
|
|
|
|
|
|
One-Versus-One (OVO) Classification
One-Versus-All (OVA) Classification
| Metric | Aim |
|---|---|
| Confusion Matrix | True/False pos & neg rates |
| Accuracy | Overall classification rate |
| Precision & Recall (PR) | Trade‑off critical for imbalanced data |
| ROC / AUC | Ranking quality of probabilities |
| KS Statistic | Discrimination for credit risk |
金融实践 → PD 模型评价、欺诈检出率、风控敏感性分析。
| 方法 | 数学思想/假设 | 非线性能力 | 输出 | 主要优势 |
|---|---|---|---|---|
| Logistic Regression |
线性决策边界;估计 |
低 | 概率 | 高度可解释、标准误可得 |
| LDA | 类条件正态且协方差相同 | 线性 | 概率 | 稳定、最小误差界 |
| QDA | 类条件正态但协方差不同 | 中高 | 概率 | 能拟合不同形状边界 |
| Naïve Bayes | 特征条件独立 | 中 | 概率 | 简单、高维文本类适用 |
| GAM | 多变量非线性可加 | 高 | 概率或期望 | 灵活且可解释 |
| SVM | 最大化间隔,核函数映射 | 高 | 离散决策 | 对噪声稳健、边界清晰 |
| 场景 | 适用方法 | 说明 |
|---|---|---|
| 信用评分 / 违约预测 | Logistic, GAM | 监管认可、可解释概率输出 |
| 企业破产 / 风险等级 | LDA, QDA | 经典统计判别思路 |
| 欺诈检测 | SVM, Naïve Bayes | 高维特征、分类边界复杂 |
| 市场状态识别(牛/熊) | SVM, GAM | 可建非线性或时变边界 |
| 文本情绪正负分类 | Naïve Bayes, SVM | 高维稀疏词向量场景 |
| 宏观政策立场分类 | Logistic, GAM | 输出概率方便经济解释 |
| 比较维度 | Logistic | LDA | QDA | Naïve Bayes | GAM | SVM |
|---|---|---|---|---|---|---|
| 可解释性 | 高 | 高 | 中 | 中 | 高 | 低 |
| 非线性能力 | 中 | 弱 | ||||
| 小样本性能 | 可能过拟合 | 需正则 | 中 | |||
| 高维特征容忍度 | 需正则 | 不佳 | 不佳 | 依赖核 | ||
| 输出形式 | 概率 | 概率 | 概率 | 概率 | 概率/期望 | 类别或间隔分数 |
| 计算效率 | 高 | 高 | 中 | 高 | 中 | 相对慢 |
| 监管接受度 | 中 | 中 | 低 | |||
| 典型数据结构 | 表格结构 | 连续特征 | 连续特征不同方差 | 离散文本/分类 | 多维非线性时序 | 高维非线性 |
| 任务 | 数据特点 | 推荐算法 | 原因 |
|---|---|---|---|
| 信用评分 / 违约概率 | 中小样本、易解释 | Logistic or GAM | 输出概率、可视化解释、合规 |
| 企业分类 / 财务风险层级 | 多变量但正态性可近似 | LDA/QDA | 经典实证传统 |
| 文本或信件分类 | 高维词频、稀疏 | Naïve Bayes or SVM | 对高维文本表现优 |
| 宏观经济状态判别 | 非线性、多因素 | GAM or SVM | 可捕捉非线性或边界变化 |
| 市场操纵 / 欺诈检测 | 噪声多、复杂模式 | SVM or GAM | 强非线性识别能力 |
建议:
|
|
Tree‑based methods combine accuracy with interpretability, bridging prediction and explanation in finance.
|
|
|
Training objective: Find splits that maximize the reduction in MSE. Categorical inputs: Splits compare feature |
Example recap
CART: Classification Trees
|
|
|
Training objective: Instead of minimizing squared error (as in regression trees), classification trees minimize impurity at each split. Common impurity measures:
Categorical inputs: Splits compare |
Example recap:
|
|
|
|
Core Idea: Combine multiple base learners to produce a stronger, more stable model.
Why It Works
Common Ensemble Methods
| Category | How models are combined | Goal | Examples |
|---|---|---|---|
| Averaging | Train models independently and average or vote their predictions | Reduce variance | Bagging, Random Forests |
| Boosting | Train models sequentially, each focusing on previous errors | Reduce bias | AdaBoost, Gradient Boosting |
| Stacking | Learn a meta-model to optimally combine base model outputs | Leverage diverse learners | Stacking, Blending |
|
Regression Ensembles: averaging predictions
|
Classification Ensembles: voting or prob. averaging
|
Why Bagging Works
Notes
Why Boosting Works
Key Characteristics
| Aspect | Boosting | Bagging / RF |
|---|---|---|
| Training style | Sequential, adaptive | Parallel, independent |
| Main goal | Reduce bias | Reduce variance |
| Model dependency | Later models depend on earlier ones | Models trained independently |
| Typical base learners | Weak (e.g., shallow trees) | Unstable (e.g., deep trees) |
| Example algorithms | AdaBoost, Gradient Boosting | Bagging, Random Forests |
Notes
|
Model Definition: We seek an additive model combining multiple base learners:
|
Optimization Framework: We minimize an empirical loss function At each stage
|
This is a forward stagewise additive approach — each step performs a local optimization to reduce the total loss.
Special Cases
| Algorithm | Loss Function | Interpretation |
|---|---|---|
| AdaBoost | Exponential loss: |
Weight update |
| Gradient Boosting | Arbitrary differentiable loss |
Learners fit the negative gradient of |
Summary
|
Core Idea
|
|
Example: Quadratic loss and least squares boosting
|
|
Insight
Typical Hyperparameters
| Parameter | Role | Common Range |
|---|---|---|
| Number of boosting rounds | 100–1000 | |
| Shrinks step size |
0.01–0.1 | |
| Tree depth | Base learner capacity | 3–8 |
| Subsampling rate | Regularization by randomness | 0.5–1.0 |
Notes
Why Regularization Matters
Three Core Regularization Techniques
| Technique | Key Idea | Practical Effect | Example in Financial Research |
|---|---|---|---|
| (1) Shrinkage (learning rate) | Reduce the update step after each iteration: |
Each model contributes less individually. More iterations → smoother convergence. | In credit risk modeling, a small learning rate (e.g., 0.05) prevents the model from fitting extreme or rare default cases too aggressively. |
| (2) Subsampling (stochastic sampling) | Use a random subset (e.g., 50–80%) of observations at each iteration. | Adds randomness, reducing variance. Works like stochastic gradient descent. | In high‑frequency trading forecasts, random subsampling mitigates market micro‑noise and avoids overfitting transient patterns. |
| (3) Tree Constraints (structural control) | Limit tree complexity—depth, number of leaves, or minimum leaf size. | Reduces model flexibility, controlling overfitting. | In macroeconomic forecasting, using shallow trees (depth ≤ 4) prevents the model from reacting to short‑term, non‑structural fluctuations. |
Regularization in Practice
Practical Implications: Robustness over Perfection
| Research Context | Overfitting Risk | Recommended Strategy |
|---|---|---|
| Credit scoring (small sample, many predictors) | High | Small learning rate + shallow trees |
| Macroeconomic forecasting | Medium | Subsampling + depth constraint |
| High‑frequency trading prediction | Very high | Strong regularization + time‑segmented training |
| Portfolio risk modeling | Medium | Conservative parameters + repeated cross‑validation |
Why Stacking Works
Key Characteristics
| Aspect | Stacking | Bagging | Boosting |
|---|---|---|---|
| Architecture | Hierarchical (multi‑level) | Parallel | Sequential |
| Base models | Heterogeneous (different types) | Homogeneous | Homogeneous |
| Dependency | Meta‑model depends on base outputs | Independent | Step‑wise dependent |
| Main goal | Combine diverse modeling strengths | Reduce variance | Reduce bias |
| Example meta‑model | Linear / Ridge regression | – | – |
|
Data Splitting Strategy — Avoiding Information Leakage
|
Base Models — How to Choose
Design principle: choose diverse but complementary learners that reflect different economic structures. |
Meta‑Model — How to Train
Workflow Summary
Step 1 Split data into training folds
Step 2 Train diverse base models → get out-of-fold predictions
Step 3 Construct meta-features from base predictions
Step 4 Train meta-model on these meta-features
Step 5 Apply trained pipeline to test data or new time periods
Common Pitfalls
| Issue | Description | Mitigation |
|---|---|---|
| Data leakage | Meta‑model uses in‑sample predictions | Strict fold separation or rolling‑window setup |
| Over‑complex meta‑model | Learns base model noise rather than signal | Use regularized regressors |
| Limited sample | Too few observations to estimate second layer | Reduce base model number or fold count |
| Inconsistent scaling | Base model outputs on different scales | Standardize before meta‑training |
Summary
Three Main Approaches at a Glance
| Aspect | Bagging | Boosting | Stacking |
|---|---|---|---|
| Core Strategy | Parallel resampling and voting | Sequential error correction | Hierarchical model integration |
| Model Dependency | Independent learners | Each learner depends on prior errors | Meta‑level depends on base outputs |
| Main Objective | Reduce variance | Reduce bias | Combine diverse model strengths |
| Typical Base Learners | Unstable models (e.g., deep trees) | Weak models (e.g., shallow trees) | Mixed models (linear + nonlinear) |
| Combining Rule | Averaging / voting | Weighted additive updates | Meta‑model learns optimal weights |
| Representative Algorithms | Random Forest | AdaBoost, GBM, XGBoost | Stacked Generalization |
| Bias–Variance–Diversity View | ↓ Variance | ↓ Bias | ↑ Model Diversity |
How They Complement Each Other
Guidance for Financial & Economic Applications
| Scenario | Preferred Method | Reason / Goal |
|---|---|---|
| Credit risk modeling (imbalanced labels, tabular data) |
Boosting (e.g., XGBoost) | Focuses on hard‑to‑classify defaults; handles feature interactions. |
| Macroeconomic forecasting (few features, temporal structure) |
Bagging / Random Forest | Reduces variance from small samples; robust to outliers. |
| Market microdata or multi‑source models (prices, text, fundamentals) |
Stacking | Integrates heterogeneous models; combines interpretability and flexibility. |
| Portfolio optimization or volatility forecasting | Stacking / Bagging mix | Balances predictive stability with adaptability across regimes. |
Unsupervised learning as data‑driven exploration for structure discovery in financial systems.
From Supervised to Unsupervised
Core Philosophy
| Supervised | Unsupervised |
|---|---|
| Learns from known targets |
Learns from input structure only |
| Goal: minimize error or loss | Goal: maximize pattern clarity or compactness |
| Typical tasks: regression, classification | Typical tasks: clustering, dimensionality reduction, anomaly detection, association rules |
| Oriented toward prediction | Oriented toward understanding / exploration |
Example Intuitions
Why It Matters in Finance & Economics
| Application Area | Example Use Case | Benefit |
|---|---|---|
| Market Structure Analysis | Identify groups of stocks with similar return behavior | reveal sector co‑movements |
| Consumer Finance / Credit | Cluster borrowers by spending and repayment patterns | better segmentation, risk profiling |
| Macroeconomics | Extract hidden economic factors from multiple indicators | simplify large datasets for policy analysis |
| Fraud / Crisis Detection | Spot anomalies in transaction or macro trends | early warning and control |
Conceptual Analogy
Supervised → teacher provides correct answers
Unsupervised → students self-organize into study groups
The “teacher” (labels) is absent — yet insights emerge from how data points relate to each other.
This makes unsupervised methods ideal for exploratory analysis and hypothesis generation.
Transition
In the following pages, we will explore major unsupervised methods:
- Clustering → discovering similarity structures,
- Dimensionality Reduction → summarizing complex variables,
- Association Rules & Anomaly Detection → uncovering hidden relationships and outliers.
These techniques turn raw, unlabeled data into interpretable economic knowledge.
Main Approaches
| Method | Key Principle | Strengths | Limitations |
|---|---|---|---|
| K‑Means | Minimize within‑cluster variance (inertia) | Fast, simple, widely used | Must pre‑specify K; sensitive to scale and outliers |
| Hierarchical Clustering | Merge or split clusters based on distance linkage (single, complete, average) | Visual dendrogram; no preset K | Computationally heavy for large N |
| DBSCAN | Density‑based: clusters are dense regions separated by sparse ones | Detects irregular shapes and noise | Parameter tuning (ε, MinPts) required |
|
|
How to Choose a Method
| Data Characteristic | Recommended Method | Reason |
|---|---|---|
| Clear cluster boundaries, roughly spherical | K‑Means | Efficient, stable centroids |
| Unknown number of groups, need hierarchy | Hierarchical | Reveals nested structure |
| Irregular shapes or noise present | DBSCAN | Density‑based robustness |
| Very large N*, high‑dimensional | Start with MiniBatch K‑Means | Scalable approximation |
Evaluating Cluster Quality
In financial applications, interpretability of clusters is as important as numerical compactness.
Financial & Economic Relevance
| Domain | Example Use | Outcome |
|---|---|---|
| Financial Markets | Group stocks or investors by return correlation patterns | Identify market regimes or style clusters |
| Consumer Behavior | Segment clients by transaction history | Target marketing and credit strategies |
| Macro Policy | Cluster countries by macro indicators | Reveal structural similarity or divergence |
Objective
The cluster structure often reveals latent regimes or business strategies invisible to simple averages.
Example Dataset
| Entity | Features Used for Clustering | Economic Meaning |
|---|---|---|
| Firms (Marketing / Retail) | Sales growth, advertising ratio, product diversity, digital channel usage | Reflect market behavior and innovation intensity |
| Consumers (Finance / Banking) | Spending frequency, average transaction size, credit utilization rate | Reveal different consumption / risk profiles |
| Stocks (Market Data) | Average return, volatility, turnover, correlation to index | Identify style clusters or behavioral regimes |
Standardization is essential — scale all features to comparable units before clustering.
Workflow
Illustrative Result
| Cluster ID | Profile Summary | Representative Behavior |
|---|---|---|
| 1 | High sales & high digital usage | Digital Leaders |
| 2 | Moderate growth & traditional channels | Conventional Players |
| 3 | Small firms, low marketing spend | Niche Survivors |
Economic Interpretation
In research terms, clustering can act as an unsupervised labeling mechanism for subsequent models.
Extension Ideas
| Direction | Purpose in Economic Research |
|---|---|
| Cluster stability over time | Study structural change or market regime shifts |
| Cluster transition matrix | Evaluate mobility among behavior types |
| Combine with supervised models | Use cluster label as explanatory or control variable |
| Hybrid approach (K‑Means + DBSCAN) | Capture both core groups and fringe anomalies |
|
|
Dimensionality Reduction compresses data while preserving its most important variance or structure.
Two Main Philosophies: Principal Component Analysis and t‑Distributed Stochastic Neighbor Embedding
| Method | Type | Key Idea | Output Space |
|---|---|---|---|
| PCA | Linear Projection | Rotates axes to maximize variance explained | Components are linear combos of original variables |
| t‑SNE | Nonlinear Manifold Learning | Preserves local neighbor relationships in low‑D space | 2‑D/3‑D embedding suitable for visualization |
PCA — Core Mechanism
Example Interpretation:
PCA uncovers latent orthogonal directions that best summarize the dataset.
|
![]()
|
the variance of the data can be decomposed into the variance of the first
we can interpret the PVE as the
|
|
|
|
Comparison Summary
| Aspect | PCA | t‑SNE |
|---|---|---|
| Linear / Nonlinear | Linear | Nonlinear |
| Goal | Maximize global variance | Preserve local similarities |
| Output interpretability | High | Low (no explicit factors) |
| Use case | Factor extraction, noise reduction | Visual exploration, clustering aid |
| Runtime scalability | Very fast | Slower for large N |
Financial & Economic Applications
| Context | How Used | Insight |
|---|---|---|
| Macroeconomics | Reduce 100 + indicators to a few principal components | Identify underlying economic cycles or shocks |
| Portfolio Risk | Decompose covariance matrix via PCA | Reveal dominant risk factors (market, size, sector) |
| ESG Analytics | Compress dozens of scores to a composite | Build interpretable sustainability indices |
| Consumer Analysis / Text Data | Visualize similarity in spending or opinions | Discover behavioral clusters |
|
Motivation
|
From Returns to Factors
|
Economic Interpretation of PCA Factors
| Principal Component | Possible Economic Meaning | Typical Pattern |
|---|---|---|
| PC1 | Market‑wide factor | Explains largest portion of price movement, highly correlated with index. |
| PC2 | Sector rotation factor | Distinguishes cyclical vs. defensive industries. |
| PC3 | Size or Liquidity factor | Captures small‑vs‑large or liquid‑vs‑illiquid contrast. |
Portfolio Risk Decomposition
Each term corresponds to the contribution of one principal component to portfolio risk.
| Component | Variance Share | Interpretation |
|---|---|---|
| PC1 | 52 % | Systematic market risk |
| PC2 | 18 % | Sector rotation risk |
| PC3+ | 30 % | Idiosyncratic or noise |
|
Core Idea Association Rule Learning discovers co‑occurrence patterns among items or events:
Originally used in retail (shopping baskets), it has broad economic and financial applications — from consumer analytics to transaction networks and risk event detection. |
Example Scenario In a supermarket dataset of transactions:
Goal → Find rules such as: {Bread, Milk} → {Beer} |
| Measure | Formula | Meaning |
|---|---|---|
| Support | Frequency of transactions containing both A and B | |
| Confidence | Probability of B given A | |
| Lift | Strength of association beyond chance ( > 1 = positive correlation ) |
Example: if Lift = 1.8, customers buying A are 80% more likely to also buy B than average.
The Apriori Algorithm
Uses the “Apriori Property”: All subsets of a frequent itemset must also be frequent.
Popular extensions: FP‑Growth, ECLAT for scalability.
Applications Beyond Retail
| Domain | Data Source | Insight Gained |
|---|---|---|
| E‑Commerce / Banking | Purchase or transaction logs | Cross‑selling & recommendation |
| Macroeconomics | Country macro indicators (e.g., inflation ↑ & energy price ↑ → policy tightening) |
Detect co‑movement patterns |
| Finance & Risk | Fraud or loss event logs | Co‑occurring trigger analysis |
| Text Analytics | Keyword or topic co‑occurrence | Identify latent issue linkages |
Interpretation in Economic Context
Typical Economic & Financial Contexts
| Domain | Anomaly Example | Value of Detection |
|---|---|---|
| Banking & Payments | Unusual transaction pattern or amount | Fraud prevention, AML systems |
| Financial Markets | Abnormal return or volatility spike | Early signal of market stress |
| Macroeconomics | Sudden divergence in indicators (e.g., credit vs growth) | Crisis early warning |
| Corporate Finance | Unexpected accounting figures | Governance & audit inspections |
Unsupervised anomaly detection is crucial when labeled fraud or crisis data are limited.
Major Approaches
| Approach | Mechanism | When to Use |
|---|---|---|
| Statistical Thresholding | Identify points far from mean ( z‑score , IQR rule ) | Small datasets, interpretable |
| Distance‑Based Methods | Compute nearest neighbors → flag isolated points | Moderate datasets, clear metric space |
| Density‑Based (DBSCAN / LOF) | Low density = outlier | Non‑linear structure |
| Model‑Based (One‑Class SVM, Isolation Forest) | Learn boundary of “normal” region | High‑dimensional / complex data |
Isolation Forest Key Idea: randomly partition data; anomalies require fewer splits to isolate.
Quantitative Evaluation
| Metric | Meaning |
|---|---|
| Precision / Recall | Trade‑off between missed and false detections |
| ROC / PR Curve | Evaluates model discrimination if partial labels exist |
| Economic Validation | Check if flagged anomalies align with known events (e.g., financial crises 2008, COVID shock) |
Economic Interpretation
In both cases, anomalies = “weak signals” preceding major events.
Hybrid & Practical Systems
What We Learned
| Method Family | Main Goal | Economic Meaning |
|---|---|---|
| Clustering | Group similar observations | Market segmentation / structural regime identification |
| Dimensionality Reduction (PCA / t‑SNE) | Compress information, extract latent components | Factor extraction / risk decomposition |
| Association Rules | Discover co‑occurrence logic | Behavior linkages / policy indicator relations |
| Anomaly Detection | Identify irregular samples | Fraud screening / crisis early signals |
The common thread: finding order within apparent randomness.
Conceptual Integration
Advantages in Economic Analysis
| Aspect | Value Added by Unsupervised Learning |
|---|---|
| Exploratory Power | Reveals latent structures before setting hypotheses |
| Scalability | Handles large, multi‑dimensional datasets |
| Adaptivity | Works even with limited or no labeled data |
| Complementarity | Enhances traditional econometric models (e.g., factor analysis, structural breaks) |
Especially valuable in “data‑rich, theory‑light” contexts like finance and policy analytics.
Methodological Reflection
Practical Implementation Checklist
Always treat the algorithm as a lens, not as the truth itself.
|
Let
Equivalently, regress |
View as projection:
Bias–variance mechanism:
|
| Step | Description |
|---|---|
| 1️ | Standardize X |
| 2️ | Compute eigenvectors P of Σₓ = X'X / n |
| 3️ | Keep first K principal components T = X Pₖ |
| 4️ | Regress Y on T |
| 5️ | Obtain fitted β = Pₖ βₜ |
Choose
Interpretation: PCR emphasizes variance structure, not predictive correlation → for descriptive factor discovery. |
Works when:
Fails when:
Takeaway:
|
|
PLS builds components using Y information. First component:
Interpretation:
|
|
Component-wise (NIPALS/SIMPLS idea):
Closed‑form model after
with
Krylov subspace view:
|
Geometry
Link to related methods:
|
Shrinkage profiles:
Implication:
|
Asset‑pricing factors:
Macro forecasting:
Credit risk:
Preprocessing:
| Feature | PCR | PLS |
|---|---|---|
| Uses |
No | Yes |
| Objective | ||
| Predictive power | Medium | Higher |
| Interpretability | High | Moderate |
| Typical goal | Data summarization | Forecasting |
In financial econometrics PLS often yields factors that better forecast returns or macro variables.
| Method | Dimension Reduction Mechanism |
Uses Info |
Variable Selection |
Interpretability | Prediction Accuracy |
|---|---|---|---|---|---|
| PCR | PCA on |
Moderate | |||
| PLS | Medium | High | |||
| LASSO | Sparse | High | |||
| Ridge | Stable | Medium | |||
| Elastic Net | Medium | High |
| Research Goal | Data Structure | Recommended Method |
Reason |
|---|---|---|---|
| Explain structural relations | Moderate dimensionality | PCR | Captures underlying data structure |
| Forecast |
High collinearity | PLS | Uses |
| Feature selection / large p | Sparse relevant signals | LASSO / Elastic Net | Automatic variable selection |
| Stable estimates / collinearity | p ≈ n large | Ridge / PLS | Shrinkage stabilizes estimates |
| Mixed objectives (explain + predict) |
High‑dimensional, noisy |
Hybrid PLS + Regularization | Emerging trend in finance |
Rule of thumb:
Summary · Lecture 02
| Topic | Essence | Financial Applications |
|---|---|---|
| Regression | Linear + Regularized models for continuous targets | Return & risk prediction |
| Classification | Binary decisions based on features | Credit scoring, fraud detection |
| Tree‑Based Models | Ensemble methods (GBM, RF) for accuracy & interpretability | PD modeling, risk rating |
| Unsupervised Learning | Clustering & PCA to find hidden patterns | Regime analysis, factor extraction |
Shallow learning forms the foundation for later Deep Learning methods.
Recommended Readings
- Ridge tends to keep all variables but small. - LASSO performs explicit feature selection. - Elastic Net balances bias–variance trade‑off.
> Example: combining a macro‑theory model with a data‑driven model to improve forecast stability.
> Together, these methods form a **continuum** from *variance control* → *bias correction* → *model diversification*.
> Think of unsupervised learning as *asking data to tell its own story.*
> It answers the question: “Who looks like whom?”
- **K‑Means**: partitions space around centroids. - **Hierarchical**: builds a dendrogram; you "cut" it at a chosen level. - **DBSCAN**: groups dense points, labels sparse outliers as noise.