Bandit-v0

class neurogym.envs.bandit.Bandit(dt=100, n=2, p=(0.5, 0.5), rewards=None, timing=None)[source]

Multi-arm bandit task.

On each trial, the agent is presented with multiple choices. Each option produces a reward of a certain magnitude given a certain probability.

Parameters:
  • n – int, the number of choices (arms)

  • p – tuple of length n, describes the probability of each arm leading to reward

  • rewards – tuple of length n, describe the reward magnitude of each option when rewarded

Reference paper

Prefrontal cortex as a meta-reinforcement learning system

Tags

n-alternative

Reinforcement learning and analysis of this task

[Open in colab] [Jupyter notebook Source]

Sample run
../_images/Bandit-v0_examplerun.png