Bandit-v0

class neurogym.envs.bandit.Bandit(dt=100, n=2, p=(0.5, 0.5), rewards=None, timing=None)[source]

Multi-arm bandit task.

On each trial, the agent is presented with multiple choices. Each option produces a reward of a certain magnitude given a certain probability.

Parameters:

n – int, the number of choices (arms)
p – tuple of length n, describes the probability of each arm leading to reward
rewards – tuple of length n, describe the reward magnitude of each option when rewarded

Reference paper: Prefrontal cortex as a meta-reinforcement learning system
Tags: n-alternative
Reinforcement learning and analysis of this task: [Open in colab] [Jupyter notebook Source]
Sample run