#optimisation

Articles tagged with #optimisation

Trust Region Methods: From REINFORCE to TRPO to PPO
In the REINFORCE post, we built a policy gradient agent from scratch in NumPy and watched it learn CartPole. It worked — eventually. But the reward curve looked like a seismograph. One batch of unluck
Jun 17, 202621 min read
Changepoint Detection: Finding Regime Shifts in Financial Data
Markets do not stay in one regime. The S&P 500 can cruise at 10% annualised volatility for months, then a crisis hits and volatility doubles overnight. Any model trained on the calm period is useless
Jun 9, 202614 min read
Value Iteration vs Q-Learning: Dynamic Programming Meets RL
You have a map of the frozen lake. Every crack in the ice, every slippery patch, every hole is marked. You can sit at your desk and plan the perfect route before stepping foot on the ice. That is valu
May 4, 202614 min read
Solving CartPole Without Gradients: Simulated Annealing
In the previous post, we solved CartPole using the Cross-Entropy Method: sample 200 candidate policies, keep the best 40, refit a Gaussian, repeat. It worked beautifully, reaching a perfect score of 5
Apr 23, 202617 min read
The Cross-Entropy Method: Solving RL Without Gradients
Reinforcement learning has accumulated layers of complexity over the years: value functions, policy gradients, replay buffers, target networks. The Cross-Entropy Method predates all of it. Rubinstein
Apr 21, 202614 min read
AI Experts Are Dead. Long Live the AI Experts.
Last month, my eight-year-old built a Flappy Bird clone from scratch. He can't really type yet. He certainly can't write Python. What he can do is talk to Claude while I whisper in his ear what to say
Apr 15, 202616 min read1

#optimisation - Sesen AI