Bandit-v0
- class neurogym.envs.bandit.Bandit(dt=100, n=2, p=(0.5, 0.5), rewards=None, timing=None)[source]
Multi-arm bandit task.
On each trial, the agent is presented with multiple choices. Each option produces a reward of a certain magnitude given a certain probability.
- Parameters:
n – int, the number of choices (arms)
p – tuple of length n, describes the probability of each arm leading to reward
rewards – tuple of length n, describe the reward magnitude of each option when rewarded