机器学习vs统计方法
监督学习vs无监督学习
深度学习vs强化学习
分类问题
鸢尾花分类
图像分类
探索性数据分析
|
![]() |
学习分类器
|
![]() |
似然函数(likelihood function):
对数似然函数(log likelihood function)
年份 | 期刊 | 论文标题 | 分类算法 | 研究问题 | 主要发现 |
---|---|---|---|---|---|
2023 | RFS | When Does Machine Learning Help Corporate Credit Rating Prediction?^1 | XGBoost, Random Forest, SVM | 企业信用评级预测 | 机器学习模型在预测企业信用评级时显著优于传统统计方法,特别是在数据量大且复杂的情况下 |
2022 | JF | Machine Learning for Active Management^2 | 随机森林, GBDT | 股票收益预测与组合管理 | 机器学习方法可以显著提高投资组合的风险调整收益 |
2021 | JFE | FinBERT: A Deep Learning Approach to Extracting Textual Information^3 | BERT, CNN | 金融文本分类与情感分析 | 深度学习模型在金融文本分析中表现优异,可有效提取文本信息 |
2020 | RFS | Man vs. Machine Learning: The Term Structure of Earnings Expectations and Conditional Biases^4 | Neural Networks, Boosting | 盈利预测偏差分析 | 机器学习模型能够有效识别分析师预测中的系统性偏差 |
年份 | 期刊 | 论文标题 | 分类算法 | 研究问题 | 主要发现 |
---|---|---|---|---|---|
2019 | JFE | Predicting Corporate Default with Machine Learning^5 | Random Forest, SVM, Neural Networks | 公司违约预测 | 机器学习模型在预测公司违约方面显著优于传统方法 |
2018 | JF | Machine Learning and the Stock Market^6 | Decision Trees, Random Forest | 市场异常识别 | 机器学习可以有效识别和预测市场异常现象 |
2017 | RFS | Text-Based Network Industries and Endogenous Product Differentiation^7 | LDA, 文本分类 | 产品差异化分析 | 文本分析方法可以有效衡量产品差异化程度 |
2016 | JFE | Machine Learning and Prediction in Economics and Finance^8 | SVM, Neural Networks | 金融市场预测 | 机器学习在金融预测中的应用优势及局限性分析 |
机构名称 | 应用时间 | 项目名称 | 分类算法 | 应用场景 | 主要成果 |
---|---|---|---|---|---|
HSBC | 2020-2023 | Anti-Money Laundering System | XGBoost, Random Forest, LightGBM | 反洗钱交易识别 | - 可疑交易识别准确率达95% - 误报率降低70% - 调查效率提升200% - 合规成本降低45% |
American Express | 2019-2023 | Fraud Detection Engine | Gradient Boosting, SVM, Neural Networks | 信用卡欺诈检测 | - 欺诈检测准确率达92% - 实时响应速度<10ms - 损失金额降低65% - 客户满意度提升40% |
机构名称 | 应用时间 | 项目名称 | 分类算法 | 应用场景 | 主要成果 |
---|---|---|---|---|---|
Visa | 2018-2023 | Transaction Risk Scorer | Random Forest, CatBoost, Deep Learning | 交易风险评估 | - 风险识别准确率90% - 交易通过率提升25% - 欺诈损失降低55% - 处理效率提升300% |
Citigroup | 2017-2023 | Credit Application Classifier | XGBoost, LightGBM, Neural Nets | 信贷审批分类 | - 审批准确率提升50% - 处理时间减少80% - 坏账率降低40% - 客户转化率提升35% |
PayPal | 2016-2023 | Risk Management System | Ensemble Methods, Deep Learning | 支付风险管理 | - 风险识别准确率93% - false positive降低60% - 交易成功率提升30% - 用户体验提升45% |
技术特点:
|
创新应用:
|
业务价值:
|
核心功能:
|
技术优势:
|
实际效果:
|
系统特点:
|
创新点:
|
业务成果:
|
技术创新:
|
系统优势:
|
应用效果:
|
技术架构:
|
核心功能:
|
业务价值:
|
回归问题
![]() |
|
年份 | 期刊 | 论文标题 | 回归算法 | 研究问题 | 主要发现 |
---|---|---|---|---|---|
2023 | JFE | Machine Learning Asset Pricing^1 | Neural Networks, LASSO, Ridge | 资产定价预测 | 机器学习模型在资产收益预测中表现优于传统方法,特别是在处理非线性关系时 |
2022 | RFS | Machine Learning and Returns Prediction^2 | Elastic Net, Random Forest Regression | 股票收益预测 | 集成学习方法能更好地捕捉市场异常和预测收益 |
2021 | JF | Automated Financial Management^3 | Gradient Boosting Regression, XGBoost | 投资组合优化 | 机器学习算法在资产配置和风险管理中显示出显著优势 |
2020 | JFE | Real Estate Values and Machine Learning^4 | Neural Networks, SVR | 房地产估值 | 深度学习模型能更准确预测房地产价值变动 |
年份 | 期刊 | 论文标题 | 回归算法 | 研究问题 | 主要发现 |
---|---|---|---|---|---|
2019 | RFS | Empirical Asset Pricing via Machine Learning^5 | Neural Nets, Regression Trees, LASSO | 资产风险溢价测量 | 机器学习方法能更好地捕捉资产收益的预测性特征 |
2018 | JF | Predicting Returns with Text Data^6 | Ridge Regression, Neural Networks | 文本分析与收益预测 | 结合文本分析的机器学习模型能提高收益预测准确性 |
2017 | JFE | Machine Learning for Stock Selection^7 | Boosting Regression, Random Forest | 股票选择 | 机器学习方法在股票选择中优于传统因子模型 |
2016 | RFS | Option Pricing with Machine Learning^8 | Support Vector Regression, Neural Networks | 期权定价 | 机器学习模型在期权定价中表现出色,特别是对于复杂衍生品 |
机构名称 | 应用时间 | 项目名称 | 回归算法 | 应用场景 | 主要成果 |
---|---|---|---|---|---|
BlackRock | 2020-2023 | Systematic Asset Pricing | XGBoost, Random Forest, LASSO | 资产定价与收益预测 | - 预测准确率提升45% - 定价效率提升60% - 投资组合收益提升25% - 风险调整后收益提升30% |
Goldman Sachs | 2019-2023 | Credit Risk Assessment | Gradient Boosting, Neural Networks, Ridge | 信用风险评估 | - 违约预测准确率达88% - 风险评估效率提升150% - 坏账率降低35% - 信贷决策速度提升200% |
机构名称 | 应用时间 | 项目名称 | 回归算法 | 应用场景 | 主要成果 |
---|---|---|---|---|---|
JPMorgan | 2018-2023 | Market Impact Predictor | LightGBM, Elastic Net, SVR | 交易成本预测 | - 交易成本降低40% - 市场冲击预测准确率85% - 执行效率提升55% - 流动性成本降低30% |
Morgan Stanley | 2017-2023 | Options Pricing Engine | Random Forest, AdaBoost, Neural Nets | 期权定价 | - 定价准确率提升50% - 计算速度提升300% - 对冲效率提升45% - 交易利润提升25% |
Citadel | 2016-2023 | Factor Return Predictor | XGBoost, LASSO, Ridge | 因子收益预测 | - 预测准确率提升40% - 策略收益提升35% - 风险调整收益提升28% - 换手率降低20% |
技术特点:
|
创新应用:
|
核心功能:
|
应用效果:
|
技术优势:
|
业务价值:
|
系统特点:
|
实际效果:
|
技术创新:
|
业务成果:
|
年份 | 期刊 | 论文标题 | 深度学习/LLM算法 | 研究问题 | 主要发现 |
---|---|---|---|---|---|
2023 | RFS | Large Language Models in Finance: A Market Microstructure Perspective[^1] | GPT-3, BERT | 市场微观结构分析 | LLM能有效分析交易数据和市场信息流,提高市场效率预测准确度 |
2023 | JFE | FinBERT: Financial Sentiment Analysis with BERT[^2] | BERT, Transformer | 金融文本情感分析 | 针对金融领域优化的BERT模型在情感分析任务上显著优于传统方法 |
2022 | JF | Deep Learning in Asset Pricing[^3] | CNN, LSTM | 资产定价 | 深度学习模型能捕捉复杂的定价因子,提高预测准确性 |
2022 | RFS | News and Corporate Bond Trading[^4] | BERT, RoBERTa | 新闻影响分析 | 大语言模型能准确评估新闻对债券交易的影响 |
2021 | JFE | Deep Portfolio Management[^5] | DNN, Reinforcement Learning | 投资组合管理 | 深度强化学习在动态投资组合管理中表现优异 |
年份 | 期刊 | 论文标题 | 深度学习/LLM算法 | 研究问题 | 主要发现 |
---|---|---|---|---|---|
2021 | RFS | Textual Analysis and Machine Learning in Finance[^6] | Transformer, LSTM | 文本挖掘 | 深度学习模型在处理非结构化金融数据方面表现出色 |
2020 | JF | Machine Learning and Financial Crises[^7] | RNN, LSTM | 金融危机预测 | 深度学习模型能有效预警系统性金融风险 |
2020 | JFE | Deep Learning in Credit Risk[^8] | CNN, Attention Networks | 信用风险评估 | 深度学习模型在信用风险评估中优于传统评分模型 |
2019 | RFS | Natural Language Processing in Finance[^9] | BERT, Word2Vec | 文本分析 | NLP技术能有效提取金融文本中的关键信息 |
2018 | JFE | Deep Learning in Financial Markets[^10] | CNN, RNN | 市场预测 | 深度学习在高频交易和市场预测中显示出优势 |
机构名称 | 应用时间 | 项目名称 | 深度学习/LLM算法 | 应用场景 | 主要成果 |
---|---|---|---|---|---|
Bloomberg | 2021-2023 | BloombergGPT | 自研LLM (基于GPT架构) | 金融信息处理与分析 | - 金融数据分析准确率提升65% - 新闻处理效率提升300% - 市场分析报告自动化 - 实时金融见解生成 |
Goldman Sachs | 2020-2023 | Financial BERT | BERT, FinBERT | 金融文本分析与交易信号生成 | - 文本分析准确率达92% - 交易信号准确率提升45% - 研报分析效率提升200% - 风险预警准确率提升60% |
机构名称 | 应用时间 | 项目名称 | 深度学习/LLM算法 | 应用场景 | 主要成果 |
---|---|---|---|---|---|
BlackRock | 2019-2023 | Aladdin Neural Network | CNN, LSTM, Transformer | 资产定价与风险评估 | - 定价准确率提升40% - 风险评估效率提升150% - 投资组合优化提升35% - 异常检测准确率提升70% |
JPMorgan | 2018-2023 | AI Document Reader | BERT, GPT, RoBERTa | 文档处理与合规审查 | - 文档处理速度提升500% - 合规审查准确率达95% - 人工成本降低60% - 业务响应速度提升300% |
Morgan Stanley | 2020-2023 | MS-AI Trading Platform | 深度强化学习, Transformer | 智能交易与市场预测 | - 交易执行效率提升80% - 预测准确率提升50% - 交易成本降低35% - 投资收益提升25% |
技术特点:
|
创新应用:
|
核心功能:
|
应用效果:
|
技术优势:
|
业务价值:
|
系统特点:
|
实际效果:
|
技术创新:
|
业务成果:
|
泛化差距/泛化鸿沟(generalization gap):
测试风险(test risk)
使用两个常见的指导原则和两种方法来减少过度拟合:
K折交叉验证(K-fold cross-validation)
留一交叉验证(Leave-one-out cross-validation):
All models are wrong, but some models are useful.
--- George Box
When we’re learning to see, nobody’s telling us what the right answers are — we just look. Every so often, your mother says “that’s a dog”, but that’s very little information. You’d be lucky if you got a few bits of information — even one bit per second — that way. The brain’s visual system has 1014 neural connections. And you only live for 109 seconds. So it’s no use learning one bit per second. You need more like 105 bits per second. And there’s only one place you can get that much information: from the input itself.
--- Geoffrey Hinton, 1996
聚类(Clustering)
|
![]() |
假设每个观察到的高维输出
因子分析(FA)
主成分分析(PCA):
非线性模型:神经网络
年份 | 期刊 | 论文标题 | 无监督学习算法 | 研究问题 | 主要发现 |
---|---|---|---|---|---|
2023 | JFE | Unsupervised Learning for Market Regimes[^1] | K-means++, GMM | 市场状态识别 | 无监督学习能有效识别不同的市场状态,提高投资策略的适应性 |
2022 | RFS | Network Analysis of the Stock Market Using Clustering[^2] | 层次聚类, DBSCAN | 股票市场网络分析 | 聚类算法能有效识别股票市场中的隐含结构和相关性 |
2022 | JF | Dimension Reduction in Financial Markets[^3] | PCA, t-SNE, UMAP | 金融数据降维 | 非线性降维技术在提取金融市场特征方面优于传统方法 |
2021 | JFE | Anomaly Detection in Financial Markets[^4] | Isolation Forest, AutoEncoder | 市场异常检测 | 在识别市场异常和欺诈行为方面表现出色 |
年份 | 期刊 | 论文标题 | 无监督学习算法 | 研究问题 | 主要发现 |
---|---|---|---|---|---|
2020 | RFS | Asset Allocation with Factor Clustering[^5] | K-means, 谱聚类 | 因子聚类分析 | 聚类方法能有效识别真实风险因子,改善资产配置 |
2020 | JF | Text Mining with Topic Models[^6] | LDA, NMF | 金融文本主题分析 | 主题模型能有效提取文本信息,预测市场走势 |
2019 | JFE | Trading Networks and Market Structure[^7] | 社区发现算法 | 交易网络分析 | 无监督学习揭示了交易网络的隐含结构 |
2018 | RFS | Portfolio Selection with Dimensionality Reduction[^8] | PCA, Kernel PCA | 投资组合优化 | 降维技术能提高投资组合构建的效率 |
2017 | JF | Market Segmentation Analysis[^9] | 层次聚类, SOM | 市场分割研究 | 聚类分析揭示了市场的自然分割 |
2016 | JFE | Risk Factors and Clustering in Asset Returns[^10] | GMM, t-SNE | 风险因子识别 | 无监督学习能识别潜在风险因子结构 |
机构名称 | 应用时间 | 项目名称 | 无监督学习算法 | 应用场景 | 主要成果 |
---|---|---|---|---|---|
Morgan Stanley | 2020-2023 | Market Regime Detection System | K-means++, GMM, HMM | 市场状态识别与风险预警 | - 准确识别8种市场状态 - 提前预警准确率达80% - 投资策略收益提升25% - 风险控制效率提升40% |
Goldman Sachs | 2019-2023 | Client Segmentation Platform | 层次聚类, DBSCAN, t-SNE | 客户行为分析与服务个性化 | - 识别12个细分客户群 - 客户满意度提升35% - 交叉销售收入增长45% - 客户流失率降低20% |
机构名称 | 应用时间 | 项目名称 | 无监督学习算法 | 应用场景 | 主要成果 |
---|---|---|---|---|---|
Deutsche Bank | 2018-2023 | Fraud Detection Engine | Isolation Forest, AutoEncoder, HDBSCAN | 异常交易检测 | - 欺诈检测准确率提升60% - 误报率降低40% - 实时响应速度提升3倍 - 每年节省数亿美元损失 |
UBS | 2017-2023 | Portfolio Analytics System | PCA, t-SNE, UMAP | 投资组合分析与风险分解 | - 降维效率提升70% - 风险因子识别准确率提升50% - 投资组合优化效率提升45% |
Citadel | 2016-2023 | Market Pattern Recognition | 深度自编码器, VAE, K-means | 市场模式识别与策略生成 | - 识别超过100种市场模式 - 策略收益提升30% - 风险控制效率提升55% |
技术特点:
|
创新点:
|
核心功能:
|
应用效果:
|
技术优势:
|
业务价值:
|
系统特点:
|
实际效果:
|
技术创新:
|
业务成果:
|
年份 | 期刊 | 论文标题 | 强化学习算法 | 研究问题 | 主要发现 |
---|---|---|---|---|---|
2023 | RFS | Deep Reinforcement Learning in Asset Trading[^1] | PPO, A3C | 资产交易策略 | 强化学习模型在动态市场环境中表现优异,能自适应调整交易策略 |
2022 | JF | Reinforcement Learning for Portfolio Management[^2] | DQN, DDPG | 投资组合管理 | 深度强化学习在动态资产配置中优于传统方法,特别是在高波动市场 |
2022 | JFE | Market Making with Deep Reinforcement Learning[^3] | SAC, TD3 | 做市商策略 | 强化学习能有效优化做市商的报价策略,提高市场流动性 |
2021 | RFS | Optimal Trading with Reinforcement Learning[^4] | PPO, A2C | 最优交易执行 | RL算法能显著降低交易成本,提高执行效率 |
年份 | 期刊 | 论文标题 | 强化学习算法 | 研究问题 | 主要发现 |
---|---|---|---|---|---|
2021 | JF | Dynamic Asset Allocation via RL[^5] | DDPG, TD3 | 动态资产配置 | 强化学习在考虑交易成本的动态配置中表现出色 |
2020 | JFE | Algorithmic Trading with Deep RL[^6] | DQN, Double DQN | 算法交易 | 深度强化学习能有效应对市场微观结构变化 |
2019 | RFS | High-Frequency Trading with RL[^7] | A3C, TRPO | 高频交易策略 | RL在高频交易环境中展现出强大的适应能力 |
2018 | JF | Risk Management with RL[^8] | DQN, DDPG | 风险管理 | 强化学习能有效平衡收益和风险目标 |
2017 | JFE | Options Trading via RL[^9] | Q-Learning, SARSA | 期权交易 | RL在复杂衍生品交易中显示出优势 |
2016 | RFS | Portfolio Optimization with RL[^10] | DQN | 投资组合优化 | 强化学习能处理多目标投资优化问题 |
一种探索-利用算法,通过观察投资者在不同市场环境下的投资组合选择来了解投资者随时间的风险偏好
由两个代理组成的投资机器人咨询框架
机构名称 | 应用时间 | 项目名称 | 强化学习算法 | 应用场景 | 主要成果 |
---|---|---|---|---|---|
JP Morgan | 2020-2023 | ATOM (Algorithmic Trading Optimization Machine) | DQN, PPO | 最优执行策略 | - 将交易成本降低50%以上 - 提高大额订单执行效率 - 减少市场冲击 - 实现自适应交易策略 |
BlackRock | 2019-2023 | Aladdin (Asset, Liability, Debt and Derivative Investment Network) | DDPG, SAC | 投资组合管理 | - 管理超过10万亿美元资产 - 投资组合夏普比率提升15-20% - 显著降低交易成本 - 实现多目标动态优化 |
机构名称 | 应用时间 | 项目名称 | 强化学习算法 | 应用场景 | 主要成果 |
---|---|---|---|---|---|
Two Sigma | 2018-2023 | Venn Platform | A3C, PPO | 因子投资与风险管理 | - 识别200+个新型因子 - 风险调整收益提升30% - 优化投资组合动态再平衡 - 提高风险预警准确率 |
Goldman Sachs | 2017-2023 | Atlas Trading Platform | DQN, TRPO | 做市商策略优化 | - 做市利润提升40% - 提高市场流动性 - 降低库存风险 - 优化报价策略 |
Citadel | 2016-2023 | Tactical Trading System | TD3, SAC | 战术性交易策略 | - 年化收益提升25% - 降低交易滑点 - 提高订单执行效率 - 优化多市场套利 |
技术特点:
|
创新点:
|
核心功能:
|
应用效果:
|
技术优势:
|
业务价值:
|
系统特点:
|
实际效果:
|
技术创新:
|
业务成果:
|
性质 | 统计推断 | 监督机器学习 |
---|---|---|
目标 | 具有解释力的因果模型 | 预测表现,往往解释力有限 |
数据 | 数据由模型生成 | 数据生成过程未知 |
框架 | 概率 | 算法和概率 |
表达能力 | 通常是线性的 | 非线性 |
模型选择 | 基于信息准则 | 数值优化 |
可扩展性 | 仅限于低维数据 | 缩放至高维输入数据 |
稳健性 | 容易出现过度拟合 | 专为样本外性能而设计 |
诊断 | 广泛 | 有限 |
Selecting ML Algorithms |
![]() |
[1] Athey S. The impact of machine learning on economics[J]. The economics of artificial intelligence: An agenda, 2018: 507-547.
[2] Athey S, Imbens G W. Machine learning methods that economists should know about[J]. Annual Review of Economics, 2019, 11: 685-725.
[3] Mullainathan S, Spiess J. Machine learning: an applied econometric approach[J]. Journal of Economic Perspectives, 2017, 31(2): 87-106.
[4] Cohen, Samuel N. and Snow, Derek and Szpruch, Lukasz, Black-Box Model Risk in Finance (February 9, 2021). Available at SSRN: https://ssrn.com/abstract=3782412 or http://dx.doi.org/10.2139/ssrn.3782412
[5] Goldstein I, Spatt C S, Ye M. Big data in finance[J]. The Review of Financial Studies, 2021, 34(7): 3213-3225.
[10] Giglio, Stefano and Kelly, Bryan T. and Xiu, Dacheng, Factor Models, Machine Learning, and Asset Pricing (October 15, 2021). Available at SSRN: https://ssrn.com/abstract=3943284 or http://dx.doi.org/10.2139/ssrn.3943284
[12] Kelly B T, Pruitt S, Su Y. Characteristics are covariances: A unified model of risk and return[J]. Journal of Financial Economics, 2019, 134(3): 501-524.
[13] Kozak S, Nagel S, Santosh S. Shrinking the cross-section[J]. Journal of Financial Economics, 2020, 135(2): 271-292.
[14] Tobek O, Hronec M. Does it pay to follow anomalies research? machine learning approach with international evidence[J]. Journal of Financial Markets, 2021, 56: 100588.
[15] Baba Yara, Fahiz and Boyer, Brian H. and Davis, Carter, The Factor Model Failure Puzzle (November 19, 2021). Available at SSRN: https://ssrn.com/abstract=3967588 or http://dx.doi.org/10.2139/ssrn.3967588
[16] Chen L, Pelger M, Zhu J. Deep learning in asset pricing[J]. Management Science, 2023.[18] Giglio S, Liao Y, Xiu D. Thousands of alpha tests[J]. The Review of Financial Studies, 2021, 34(7): 3456-3496.
[19] Duarte V, Fonseca J, Goodman A S, et al. Simple Allocation Rules and Optimal Portfolio Choice Over the Lifecycle[R]. National Bureau of Economic Research, 2021.
[20] Jiang, Jingwen and Kelly, Bryan T. and Xiu, Dacheng, (Re-)Imag(in)ing Price Trends (December 1, 2020). Chicago Booth Research Paper No. 21-01, Available at SSRN: https://ssrn.com/abstract=3756587 or http://dx.doi.org/10.2139/ssrn.3756587[21] Aït-Sahalia Y, Xiu D. Using principal component analysis to estimate a high dimensional factor model with high-frequency data[J]. Journal of Econometrics, 2017, 201(2): 384-399.
[23] Kelly B T, Xiu D. Financial machine learning[R]. National Bureau of Economic Research, 2023.
[24] Lopez-Lira A, Tang Y. Can chatgpt forecast stock price movements? return predictability and large language models[J]. arXiv preprint arXiv:2304.07619, 2023.
[25] Yu S, Xue H, Ao X, et al. Generating Synergistic Formulaic Alpha Collections via Reinforcement Learning[J]. arXiv preprint arXiv:2306.12964, 2023.
[26] Blitz D, Hanauer M X, Hoogteijling T, et al. The Term Structure of Machine Learning Alpha[J]. Available at SSRN, 2023.
[27] Hambly B, Xu R, Yang H. Recent advances in reinforcement learning in finance[J]. Mathematical Finance, 2023, 33(3): 437-503.
02 回归算法(Regressions) |
|
|
|
|
|
the 正规方程(normal equation) (FOC)
the OLS solution
the solution is unique since tha Hessian is positive definite
|
![]() |
where
|
![]() |
Algorithms | Lagrangian | Constrained quadratic program |
lasso |
|
|
ridge |
|
|
Consider the partial derivatives of the lasso objective
Plot the values
|
|
year | age | maritl | race | education | region | jobclass | health | health_ins | logwage | wage |
---|---|---|---|---|---|---|---|---|---|---|
2006 | 18 | 1. Never Married | 1. White | 1.0 | 2. Middle Atlantic | 1. Industrial | 1. <=Good | 2. No | 4.318063 | 75.043154 |
2004 | 24 | 1. Never Married | 1. White | 4.0 | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 2. No | 4.255273 | 70.476020 |
2003 | 45 | 2. Married | 1. White | 3.0 | 2. Middle Atlantic | 1. Industrial | 1. <=Good | 1. Yes | 4.875061 | 130.982177 |
2003 | 43 | 2. Married | 3. Asian | 4.0 | 2. Middle Atlantic | 2. Information | 2. >=Very Good | 1. Yes | 5.041393 | 154.685293 |
2005 | 50 | 4. Divorced | 1. White | 2.0 | 2. Middle Atlantic | 2. Information | 1. <=Good | 1. Yes | 4.318063 | 75.043154 |
|
![]() |
|
![]() |
|
![]() |
fitting separate low-degree polynomials over different regions of
example: piecewise cubic polynomial with a single knot at a point
degree of freedom
Using more knots leads to a more flexible piecewise polynomial
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
computing the fit at a target point |
![]() |
Algorithm: Local Regression At |
---|
1. Gather the fraction |
2. Assign a weight |
3. Fit a weighted least squares regression of the |
4. The fitted value at |
GAMs automatically model non-linear relationships that standard linear regression will miss.
The non-linear fits can potentially make more accurate predictions for the response
We can examine the effect of each
The smoothness of the function
# Use patsy to generate entire matrix of basis functions
X = pt.dmatrix('cr(year, df=4)+cr(age, df=5) + education', wage_df)
y = np.asarray(wage_df['wage'])
# Fit logistic regression model
model = sm.OLS(y, X).fit(disp=0)
y_hat = model.predict(X)
conf_int = confidence_interval(X, y, y_hat)
|
![]() |
|
![]() |
|
![]() |
# Model 1
X = pt.dmatrix('cr(age, df=5) + education', wage_df)
y = np.asarray(wage_df['wage'])
model1 = sm.OLS(y, X).fit(disp=0)
# Model 2
X = pt.dmatrix('year+cr(age, df=5) + education', wage_df)
y = np.asarray(wage_df['wage'])
model2 = sm.OLS(y, X).fit(disp=0)
# Model 3
X = pt.dmatrix('cr(year, df=4)+cr(age, df=5) + education', wage_df)
y = np.asarray(wage_df['wage'])
model3 = sm.OLS(y, X).fit(disp=0)
# Compare models with ANOVA
display(sm.stats.anova_lm(model1, model2, model3))
df_resid | ssr | df_diff | ss_diff | F Pr(>F) | |
---|---|---|---|---|---|
0 | 2994.0 | 3.750437e+06 | 0.0 | NaN | NaN |
1 | 2993.0 | 3.732809e+06 | 1.0 | 17627.473318 | 14.129318 |
2 | 2991.0 | 3.731516e+06 | 2.0 | 1293.696286 | 0.518482 |
|
![]() |
|
![]() |
Gaussian noise assumption
Robust regression: replace the Gaussian distribution for the response
variable with a distribution that has heavy tails
Likelihood | Prior | Posterior | Name |
---|---|---|---|
Gaussian | Uniform | Point | Least squares |
Student | Uniform | Point | Robust regression |
Laplace | Uniform | Point | Robust regression |
Gaussian | Gaussian | Point | Ridge |
Gaussian | Laplace | Point | Lasso |
Gaussian | Gauss-Gamma | Gauss-Gamma | Bayesian linear regression |
It is equivalent to
Huber loss function is everywhere differentiable.
optimizing the Huber loss is much faster than using the Laplace likelihood
03 分类算法(Classification) |
|
regression methods can not accommodate a qualitative response with more than wo classes
regression methods can not provide estimation of the conditional probability of responses
|
![]() |
|
![]() |
|
|
|
|
|
![]() |
classify a response variable that has more than two classes
the model
|
|
|
|
||||||||||||||||||||||||||||||||||||||||
|
|
|
![]() |
![]() |
|
|
![]() |
|
|
|
![]() |
|
|
|
|
Model the log odds ratio as a generalized additive models:
Hyperplane
|
![]() |
for a seperating hyperplane
and
Equivalently, a separating hyperplane has the property that
for all
|
![]() |
![]() |
|
![]() |
|
![]() |
|
|
![]() |
|
![]() |
|
![]() |
|
![]() |
You may try with other parameter values (e.g.
|
|
Support Vector Machines can not handle nonlinearity.
What can we do?
|
![]() |
name | function |
---|---|
Linear kernel | |
Polynomial kernel | |
Radial kernel | |
Gaussian kernel | |
Laplacian kernel | |
Sigmoid kernel |
Suppose
|
SVC
|
SVM
|
inner products / kernels
|
|
|
functional form
|
|
|
|
|
|
![]() |
|
![]() |
You may try with other parameter values (e.g.
|
![]() |
One-Versus-One (OVO) Classification
One-Versus-All (OVA) Classification
|
![]() |
The hinge loss + penalty form of support-vector classifier optimization:
SVM vs. Logistic Regression
04 决策树与集成学习 |
|
|
![]() |
|
|
|
|
|
|
|
|
|
where |
Given this, we can then compute the Gini index This is the expected error rate. To see this, note that |
|
|
![]() |
|
|
The disadvantage of bootstrap
each base model only sees, on average,
The
We can use the predicted performance of the base model on these oob instances as an estimate of test set performance.
This provides a useful alternative to cross validation.
The main advantage of bootstrap is that it prevents the ensemble from relying too much on any individual training example, which enhances robustness and generalization.
Bagging does not always improve performance. In particular, it relies on the base models being unstable estimators (decision tree), so that omitting some of the data significantly changes the resulting model fit.
Random forests: learning trees based on a randomly chosen subset of input variables (at each node of the tree), as well as a randomly chosen subset of data cases. It shows that random forests work much better than bagged decision trees, because many input features are irrelevant. |
![]() |
Boosting is an algorithm for sequentially fitting additive models where each
as long as each
forward stagewise additive modeling: sequentially optimize the objective for general (differentiable) loss functions
We then set
so
discrete AdaBoost
where
This can be found by applying the weak learner to a weighted version of the dataset, with weights
where
Thus we see that we exponentially increase weights of misclassified examples. The resulting algorithm shown in Algorithm 8, and is known as Adaboost.
where
where
source: beamlab.org
So far, we have learned:
linear input-output mapping
models/algorithms linear in parameters
where
The term "DNN" actually encompasses a larger family of models, in which we compose differentiable functions into any kind of DAG (directed acyclic graph, 有向无环图), mapping input to output.
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
It is clear that the data is not linearly separable, so a perceptron cannot represent this mapping.
this problem can be overcome by stacking multiple perceptrons on top of each other: multilayer perceptron (MLP)
first hidden unit (AND operation) computes
the second hidden unit (OR operation) computes
the third unit computes the output
An MLP can represent any logical function. However, we obviously want to avoid having to specify the weights and biases by hand. In the rest of this chapter, we discuss ways to learn these parameters from data.
Example models
![]() |
MLPs can be used to perform classification and regression for many kinds of data. We give some examples below. Try it for yourself via: https://playground.tensorflow.org |
an MLP with two hidden layers applied to a 2 d input vector (Figure
McCulloch-Pitts model of the neuron (1943):
We can combine multiple such neurons together to make an artificial neural networks, ANNs
ANNs differs from biological brains in many ways, including the following:
Most ANNs use backpropagation to modify the strength of their connections while real brains do not use backprop
Most ANNs are strictly feedforward (前馈的), but real brains have many feedback connections
Most ANNs use simplified neurons consisting of a weighted sum passed through a nonlinearity, but real biological neurons have complex dendritic tree structures (see Figure 13.8), with complex spatio-temporal dynamics.
Most ANNs are smaller in size and number of connections than biological brains
Most ANNs are designed to model a single function while biological brains are very complex systems that implement different kinds of functions or behaviors
The intermediate steps needed to compute
We can compute the Jacobian
where the sum is over all children
It is important to tune the learning rate (step size), to ensure convergence to a good solution. (Section 8.4.3.)
vanishing gradient problem(梯度消失): When training very deep models, the gradient become very small
exploding gradient problem(梯度爆炸): When training very deep models, the gradient become very large
consider the gradient of the loss wrt a node at layer
reason for the gradient vanishe problem
Name | Definition | Range | Reference |
---|---|---|---|
Sigmoid | |||
Hyperbolic tangent | |||
Softplus | [GBB11] | ||
Rectified linear unit | [GBB11;KSH12] | ||
Leaky ReLU | [MHN13] | ||
Exponential linear unit | [CUH16] | ||
Swish | [RZL17] | ||
GELU | [HG16] |
the leaky ReLU
the Exponential Linear Unit, ELU (指数线性单元)
SELU (self-normalizing ELU): A slight variant of ELU
Softplus函数[Dugas et al., 2001]
Maxout 单元 [Goodfellow et al., 2013] 也是一种分段线性函数。其他激活函数输入为上一层神经元的尽输入
Maxout单元非线性函数定义为:
Dropout
![]() |
|
全连接前馈神经网络vs卷积神经网络
|
|
卷积
|
|
给定一个输入信号序列 ![]() |
|
|
|
|
|
![]() |
|
步长(Stride) 是指卷积核在滑动时的时间间隔.图(a)给出了步长为2的卷积示例. |
零填充(Zero Padding) 是在输入向量两端进行补零.图(b)给出了输入的两端各补一个零后的卷积示例. |
卷积的结果按输出长度不同可以分为三类:
在早期的文献中,卷积一般默认为窄卷积。
而目前的文献中,卷积一般默认为等宽卷积。
|
![]() |
|
卷积作为特征提取器 |
互相关 |
![]() |
|
![]() 步长1,零填充0 |
![]() 步长2,零填充0 |
![]() 步长1,零填充1 |
![]() 步长2,零填充1 |
![]() |
|
![]() |
|
汇聚层![]() |
|
|
![]() |
|
![]() |
|
![]() |
|
|
|
![]() |
![]() |
|
|
|
|
![]() |
|
![]() |
![]() |
![]() |
简单循环网络( Simple Recurrent Network , SRN )
|
图灵完备
|
参数学习
|
随时间反向传播算法![]() |
梯度消失/爆炸与长程依赖
|
长程依赖问题的改进方法
|
![]() |
|
堆叠循环神经网络
|
双向循环神经网络
|
![]() |
![]() |
|
|
<center> <img align="center" style="padding-right:10px;" width=30% src="fig/wechat.jpg"> </center>