规模中性化
log(MktCap))的一致性次期收益与评估
Ret_t1 = shift(-1);Rank-IC = 每月 Spearman 秩相关Score 等分 N 组,算 Q1...Qn 与多空 Qn-Q1画累计收益曲线
cum = (1 + ret.fillna(0)).cumprod() - 1plt.plot(cum.index, cum.values)导出结果
to_csv('./output/xxx.csv')rank_ic.csv、quintile_returns.csv、score_monthly.csv、weights_monthly.csvdf.columns 检查;用 display(df.head()) 确认字段pd.to_numeric(..., errors='coerce');±inf 替换为 NaNgroupby.transform;如果用 apply,赋值前 reindex(df.index)Year_Fin = year(Trdmnt) - 1groupby('Stkcd')['Mretwd'].shift(-1)self(实例自身)bt.Strategy 是一个类;你的策略需要“继承”并重写方法:
__init__() 初始化依赖与状态next()/nextopen() 每个 bar 触发的逻辑(后者在开盘前触发)notify_order()/notify_trade() 订单与成交回报Cerebro 引擎 → DataFeed(行情) → Strategy(信号/下单)Broker(撮合/现金/仓位) → Analyzer/Observer(统计/可视化)resampledata)、参数优化(optstrategy)、多进程order_target_percent:按目标权重调仓cheat_on_open=True + nextopen():开盘前下单,用当日开盘成交SharpeRatio_A、DrawDown、TimeReturn 等backtesting.py(轻)、bt(组合层)、zipline-reloaded(重)import backtrader as bt, pandas as pd
class BuyAndHold(bt.Strategy):
    def next(self):
        if not self.position:
            self.order_target_percent(target=1.0)
df = pd.read_csv('single.csv', parse_dates=['date']).set_index('date')
data = bt.feeds.PandasData(dataname=df[['open','high','low','close','volume']])
cerebro = bt.Cerebro()
cerebro.adddata(data, name='TICKER')
cerebro.addstrategy(BuyAndHold)
cerebro.broker.setcommission(commission=0.001)  # 10 bps
res = cerebro.run()[0]
cerebro.plot()
cheat_on_open=True 并在 next_open() 中下单daily_bt(DataFrame)
Stkcd, datetime, open, high, low, close, volumedatetime 为交易日,升序,时区无关./output/weights_monthly.csv
RebalDate 为索引(或列转索引),值为权重RebalDate 为月末交易日日期(e.g., 2020-01-31)import backtrader as bt
import pandas as pd
import numpy as np
def load_daily_from_panel(daily_bt: pd.DataFrame):
    # daily_bt columns: ['Stkcd','datetime','open','high','low','close','volume']
    feeds = []
    for sid, g in daily_bt.groupby('Stkcd'):
        df = g[['datetime','open','high','low','close','volume']].copy()
        df['openinterest'] = 0
        df = df.set_index('datetime').sort_index()
        feed = bt.feeds.PandasData(dataname=df, name=sid, timeframe=bt.TimeFrame.Days)
        feeds.append(feed)
    return feeds
def load_weights_csv(path='./output/weights_monthly.csv'):
    # 宽表:index=RebalDate(月末),列为股票代码,值为目标权重
    w = {}
    df = pd.read_csv(path, parse_dates=['RebalDate']).set_index('RebalDate').sort_index()
    for dt, row in df.iterrows():
        weights = {c: float(v) for c, v in row.dropna().to_dict().items() if abs(v) > 1e-12}
        w[dt.date()] = weights  # 以月末日期为键
    return w
exec_mapdef map_eom_to_next_trading(weights_by_eom: dict, all_days: pd.DatetimeIndex):
    # all_days: 全部交易日 DatetimeIndex(升序)
    exec_map = {}
    for eom, weights in weights_by_eom.items():
        idx = all_days.searchsorted(pd.to_datetime(eom), side='right')
        if idx < len(all_days):
            exec_dt = all_days[idx].date()
            exec_map[exec_dt] = weights
    return exec_map
# 构造交易日历(从 daily_bt 取 union)
# all_days = pd.DatetimeIndex(sorted(daily_bt['datetime'].unique()))
# exec_map = map_eom_to_next_trading(load_weights_csv(), all_days)
| 
  |  | 
class StampDutyCommission(bt.CommInfoBase):
    params = (('stamp_duty', 0.001), ('commission', 0.00025), ('percabs', True),)
    def _getcommission(self, size, price, pseudoexec):
        comm = abs(size) * price * self.p.commission
        if size < 0:
            comm += abs(size) * price * self.p.stamp_duty
        return comm
| 
  |  | 
class TurnoverAnalyzer(bt.Analyzer):
    def start(self):
        self.turnovers, self.last_w = [], None
    def _weights_now(self):
        v = self.strategy.broker.getvalue()
        w = {}
        for d in self.strategy.datas:
            pos = self.strategy.getposition(d).size
            w[d._name] = 0.0 if v == 0 else (pos * d.close[0]) / v
        return w
    def next(self):
        w_now = self._weights_now()
        if self.last_w is not None:
            keys = set(self.last_w) | set(w_now)
            t = sum(abs(w_now.get(k,0) - self.last_w.get(k,0)) for k in keys)
            dt = bt.num2date(self.strategy.datas[0].datetime[0]).date()
            self.turnovers.append((dt, t))
        self.last_w = w_now
    def get_analysis(self):
        return {'turnover': self.turnovers}
cerebro.addanalyzer(bt.analyzers.SharpeRatio_A, _name='sharpe')
cerebro.addanalyzer(bt.analyzers.DrawDown,      _name='dd')
cerebro.addanalyzer(bt.analyzers.TimeReturn,    _name='tr')
cerebro.addanalyzer(TurnoverAnalyzer,           _name='to')
strat = cerebro.run(maxcpus=1)[0]
sr = strat.analyzers.sharpe.get_analysis().get('sharperatio')
dd = strat.analyzers.dd.get_analysis()['max']['drawdown']
ret_daily = pd.Series(strat.analyzers.tr.get_analysis())  # index: datetime
to_series = pd.Series(dict(strat.analyzers.to.get_analysis()['turnover']))
print(f'Sharpe: {sr:.3f} | MaxDD: {dd:.2f}%')
ret_m_code = ret_daily.resample('M').apply(lambda x: (1 + x).prod() - 1)
ret_m_excel = (pd.read_csv('./output/monthly_returns_table.csv', parse_dates=['date'])
                 .set_index('date')['ret_m'])
ret_m_excel = ret_m_excel.reindex(ret_m_code.index).dropna()
ret_m_code  = ret_m_code.reindex(ret_m_excel.index)
corr = ret_m_code.corr(ret_m_excel)
print('月度收益相关性 =', corr)
RebalDate 是月末交易日;映射到下一交易日后再执行next_open() 下单,确保开盘价执行;启用 set_coc(True)1e-12),避免无意义下单用 Backtrader 复现 FF5 多因子策略(2 学时) - 授课对象:零金融/编程基础的公共选修课学生 - 承接第2讲:Excel/WPS 完成的 FF5 迷你案例(BM、ROE、AG_pos) - 本讲目标:把“表格版”同一策略,迁移为 Backtrader 代码回测,并进行一致性验证
--- ### 课前准备与环境 - 安装 - `pip install backtrader pandas numpy matplotlib` - 数据组织(两条路径,课堂优先 A) - 路径A:`weights.csv`(date, ticker, weight)含月末权重,表示 t 月末信号 → t+1 月持仓 - 路径B:`scores.csv`(date, ticker, score 或 bucket),课堂用 pandas 生成权重 - 日线行情:`data/*_daily.csv` 每标的一份,列:`date,open,high,low,close,volume` - 注意对齐 - 宇宙/区间与第2讲一致;代码/文件名一致 - “自然月末”需映射到“交易月末”;成交口径统一为“月末信号,次日开盘成交”
参考:manual.pdf 第 4–11 节流程要点
--- ### Python 0→1(只讲 Backtrader 相关) - 基本对象:`int/float/str/bool`;容器:`list/dict/tuple` - 控制流:`if/for`;函数:`def`;模块导入:`import` - 文件操作:`pandas.read_csv`、`DataFrame.to_csv` ```python import pandas as pd df = pd.read_csv('weights.csv', parse_dates=['date']) print(df.head()) def topn(df, n=10): return df.sort_values('score', ascending=False).head(n) ``` - 与回测关系:读写 `weights.csv/scores.csv`,构建权重字典,进行参数配置与结果导出