Learning by teaching: reinforcement learning and regime-aware option strategy selection

Ribeiro, David António Poceiro doCarmo, David António Poceiro do2026-02-192026-02-192025-11-122025-11-12http://hdl.handle.net/10362/200507This project applies Reinforcement Learning to options trading, framing the environment as a systematic trading decision process. A Proximal Policy Optimization agent selects among risk-defined option strategies: Vertical Spreads, Straddles, Strangles, Condors, and No-Trade, based on market features such as Greeks, implied volatility, moneyness, and macro indicators (VIX3M/VIX, yield-curve slope, SPX moving averages). Trained on SPY weekly options with a constant five-week Friday-to-Friday maturity from 2020 to 2024, the model captures about 88% of the oracle benchmark and converges to two behavioural modes: short-volatility exploitation and defensive long-volatility positioning, demonstrating it can internalize financially coherent, regime-aware trading logic.engOptions tradingReinforcement learningProximal policy optimizationRegime awarenessQuantitative strategiesLearning by teaching: reinforcement learning and regime-aware option strategy selectionmaster thesis204134420