Strategy | Final Return (%) | Sharpe Ratio | Volatility (%) | Sortino Ratio | Calmar Ratio |
---|---|---|---|---|---|
This Work | 53.173 | 0.287 | 0.762 | 0.208 | 1.052 |
XGBoost (Chen and Guestrin, 2016) |
9.532 | 0.038 | 1.019 | 0.067 | 0.103 |
LightGBM (Ke et al., 2017) |
7.125 | 0.030 | 0.993 | 0.053 | 0.066 |
MLP | 3.110 | 0.013 | 0.960 | 0.023 | 0.043 |
PPO_filter (Schulman et al., 2017) |
2.865 | 0.013 | 0.886 | 0.024 | 0.017 |
FinCon (Yu et al., 2024) |
22.474 | 0.077 | 1.196 | 0.126 | 0.232 |
SEP (Koa et al., 2024) |
17.891 | 0.060 | 1.217 | 0.103 | 0.157 |
SSE 50 | -13.22 | -0.063 | 0.859 | -0.111 | -0.043 |
多源信息结合:
通过自然语言技术获取情感指标:
Lopez‑Lira, A., & Tang, Y. (2023). Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models. SSRN Working Paper. (First Version: April 6, 2023; This Version: April 9, 2023).
Forget all your previous instructions. Pretend you are a financial expert. You are a financial expert with stock recommendation experience. Answer "YES" if good news, "NO" if bad news, or "UNKNOWN" if uncertain in the first line. Then elaborate with one short and concise sentence on the next line. Is this headline good or bad for the stock price of company name in the term term?
Headline: headline
YES -> 1
;UNKNOWN -> 0
;NO -> -1
date
与 permno
聚类;所有结果均为 out-of-sample;若新闻在收盘后发布,作者保守地将可交易时间设为两天后(避免未来泄露)。Variable | Mean | SD | min | P25 | Median | P75 | Max | N |
---|---|---|---|---|---|---|---|---|
Daily Return (%) | 4.80 | 4.80 | -75.51 | -1.97 | -0.04 | 1.80 | 199.60 | 50767 |
Headline Length | 77.63 | 29.49 | 22 | 57 | 71 | 92 | 701 | 50767 |
ChatGPT Response Length | 153.64 | 38.50 | 0 | 124 | 151 | 179 | 303 | 50767 |
GPT Score | 0.25 | 0.47 | -1 | 0 | 0 | 1 | 1 | 50767 |
Event Sentiment Score | 0.16 | 0.34 | -1 | 0 | 0 | 0.50 | 1 | 50767 |
![]() |
|
Han, et al. (2022). Factor Models, Machine Learning, and Asset Pricing. ARFE.
|
|
adjusted_price = raw_price / split_factor - dividend_adjustment
。forward_fill
或 删除 instrument(缺失比例 > threshold,如 20%)return_t = log(price_t) - log(price_{t-1})
momentum_k = price_t / price_{t-k} - 1
sma_k = rolling_mean(price, k)
zscore_k = (x - rolling_mean(x,k))/ rolling_std(x,k)
series
与 meta
(window, freq, transform)。
|
|
feature_importances_
;用 LASSO 得到稀疏线性系数。for feat in candidates:
ics = rolling_corr(feat.series, future_return, window=60)
mean_ic = np.mean(ics)
tstat = mean_ic / (np.std(ics)/np.sqrt(len(ics)))
record(feat, mean_ic, tstat)
freq = defaultdict(int)
for i in range(n_rounds):
Xs, ys = subsample(X, y, frac=0.7)
model = Lasso(alpha=alpha).fit(Xs, ys)
for feat in model.selected_features_:
freq[feat] += 1
stable_feats = [f for f,c in freq.items() if c/n_rounds >= 0.7]
for t in dates:
signals = compute_signals(t)
target_weights = construct_weights(signals)
trades = rebalancing(target_weights, prev_weights, turnover_limit)
pnl = execute_trades(trades, prices[t], costs_model)
record_metrics(pnl)
data_loader
、primitive_generator
、synthesizer
、sanity_filter
、model_evaluator
、selector
、backtester
、monitor
。data = data_loader.load()
primitives = primitive_generator(data)
candidates = synthesizer(primitives)
candidates = sanity_filter(candidates)
scores = model_evaluator.score(candidates, returns)
stable = selector.stability_select(scores)
backtest_report = backtester.run(stable, data, cost_model)
monitor.setup(backtest_report)
- 三大优势:**非线性**、**高维**、**替代数据** - 三大风险:**非平稳**、**泄露**、**时间验证** - 目标:完成“特征 → 打分 → 组合 → 回测接口”的闭环 .footnote[本讲采用月频横截面示例,低门槛、可复现、样本外验证。]
### 关键表格与数值要点 <font size=5> - Table I(5 日持有,等权 H-L):I5/R5 H-L annualized ret = 0.83 (SR = 7.15);turnover ≈ 690%(weekly 高换手)。 - Table II(20 日 / 60 日持有):I5/R20 等权 H-L SR ≈ 2.4;I5/R60 等权 H-L SR ≈ 1.3(转向更低但仍显著)。 - Table IX(模型对比,等权五日):CNN SR=7.15;Logistic(image-scale)=5.56;Logistic(cum.ret)=2.50;CNN1D(image-scale)=7.20。 - Table X(国际 transfer,I5/R5):在 26 国中,直接用美股 CNN 进行 transfer,平均等权 Sharpe 从本地 retrain 的 2.3 提升到 3.6(平均增益 ≈ +1.44)。(详见论文相应表格) </font>
- **课堂任务**:画出你们的 pipeline 流程图并标注“错误容易发生”的检查点。
- **课堂练习**:实现 10 个 primitives 并输出分布、季节性图与缺失比。
- **教学示例**:展示 20 个自动化生成的 candidate formulas 并讨论直觉与风险。
- **课堂练习**:对 200 个候选因子执行相关聚类并选择 top-50 代表。
- **课堂任务**:比较 XGBoost importance 与单因子 IC 不一致的情况并讨论原因。
- **课堂练习**:用 stability selection 处理 300 个候选因子并输出 top-30 稳定因子。
- **课堂实验**:对比无成本与带成本回测结果并分析差异。
- **课堂实践**:给定训练好的 XGBoost,对 3 个代表性日期绘制 SHAP summary plot 与 PDP,并写出金融直觉解释。
与评估指标
- **评估标准**:IC 平均值、IC 稳定性(std)、回测后的信息比率(IR)/Sharpe(含成本)、因子可解释性评分(教师打分)。 - **交付物**:代码、结果表格(IC、回测绩效)、SHAP 图与简短解释文档(1-2 页)。