Computational Modeling of Human Learning in Reversal Bandit Tasks (real + simulated data)
Developed and compared two reinforcement learning models to analyze human decision adaptation in a probabilistic two-armed bandit task with a mid-task reward reversal. Implemented the Rescorla-Wagner model (fixed learning rate) and a Bayesian reinforcement model (hazard-rate-based belief updating) to simulate and fit participant behavior (11 participants × 100 trials) using maximum likelihood estimation. Model fit was assessed using BIC, and parameter recovery analyses were conducted to evaluate identifiability.
Technical Contributions:
Implemented RWM and BRM with softmax decision rules; BRM incorporated adaptive uncertainty tracking via Beta distributions and hazard rate
Simulated agent behavior and reversal learning dynamics, with realistic reward contingency shifts
Fit models to behavioral data using grid search over α, β, and λ; visualized likelihood surfaces and learning curves
Found that RWM provided better explanatory power (lower BIC), suggesting fixed learning captured participant behavior more effectively than adaptive belief updating
Tools: Python, NumPy, SciPy, Pandas, Matplotlib, Seaborn
Joint project with: David Raul Carranza Navarrete, Julian Calvin Rill , Franka Bockmann , Ewa Godlewska.