Trainers and Training Runners¶

This page contains the reference documentation for trainers and training runners:

Overview

General
Trainers
Utilities

General ¶

These are general interfaces, classes and utility functions for trainers and training runners:

`Trainer`	Interface for trainers.
`TrainingRunner`	Base class for training runner implementations.
`TrainConfig`	Top-level configuration structure.
`ModelConfig`	Model configuration structure.
`AlgorithmConfig`	Base class for all specific algorithm configurations.
`ModelSelectionBase`	Base class for model selection strategies.
`BestModelSelection`	Best model selection strategy.
`Evaluator`	Abstract interface for policy evaluation.
`MultiEvaluator`	Evaluates the given policy using multiple different evaluators (ran in sequence).
`RolloutEvaluator`	Evaluates a given policy by rolling it out and collecting the mean reward.
`ValueTransform`	Value transformation (e.g.
`ReduceScaleValueTransform`	Scale reduction value transform according to Pohlen et al (2018).
`support_to_scalar`	Convert support vector to scalar by probability weighted interpolation.
`scalar_to_support`	Converts tensor of scalars into probability support vectors corresponding to the provided range.
`BaseReplayBuffer`	Abstract interface for all replay buffer implementations.
`UniformReplayBuffer`	Replay buffer for off policy learning.

Trainers ¶

These are interfaces, classes and utility functions for built-in trainers:

Actor-Critics (AC)¶

`ACRunner`	Abstract baseclass of AC runners.
`ACDevRunner`	Runner for single-threaded training, based on SequentialVectorEnv.
`ACLocalRunner`	Runner for locally distributed training, based on SubprocVectorEnv.
`ActorCritic`	Base class for actor critic trainers.
`ActorCriticEvents`	Event interface, defining statistics emitted by the A2CTrainer.
`A2C`	Advantage Actor Critic.
`A2CAlgorithmConfig`	Algorithm parameters for multi-step A2C model.
`PPO`	Proximal Policy Optimization trainer.
`PPOAlgorithmConfig`	Algorithm parameters for multi-step PPO model.
`IMPALA`	Multi step advantage actor critic.
`ImpalaAlgorithmConfig`	Algorithm parameters for Impala.
`ImpalaEvents`	Events specific for the impala algorithm, in order to record and analyse it’s behaviour in more detail
`ImpalaRunner`	Common superclass for IMPALA runners, implementing the main training controls.
`ImpalaDevRunner`	Runner for single-threaded training, based on SequentialVectorEnv.
`ImpalaLocalRunner`	Runner for locally distributed training, based on SubprocVectorEnv.
`log_probs_from_logits_and_actions_and_spaces`	Computes action log-probs from policy logits, actions and acton_spaces.
`from_logits`	V-trace for softmax policies.
`from_importance_weights`	V-trace from log importance weights.
`get_log_rhos`	With the selected log_probs for multi-discrete actions of behavior and target policies we compute the log_rhos for calculating the vtrace.
`SAC`	Multi step soft actor critic.
`SACAlgorithmConfig`	Algorithm parameters for SAC.
`SACEvents`	Events specific for the SAC algorithm, in order to record and analyse it’s behaviour in more detail
`SACRunner`	Common superclass for SAC runners, implementing the main training controls.
`SACDevRunner`	Runner for single-threaded training, based on SequentialVectorEnv.

Evolutionary Strategies (ES)¶

`ESTrainer`	Trainer class for OpenAI Evolution Strategies.
`ESAlgorithmConfig`	Algorithm parameters for evolution strategies model.
`ESEvents`	Event interface, defining statistics emitted by the ESTrainer.
`ESMasterRunner`	Baseclass of ES training master runners (serves as basis for dev and other runners).
`ESDevRunner`	Runner config for single-threaded training, based on ESDummyDistributedRollouts.
`SharedNoiseTable`	A fixed length vector of deterministically generated pseudo-random floats.
`Optimizer`	Abstract baseclass of an optimizer to be used with ES.
`SGD`	Stochastic gradient descent with momentum
`Adam`	Adam optimizer
`ESRolloutResult`	Result structure for distributed rollouts.
`ESDummyDistributedRollouts`	Implementation of the ES distribution by running the rollouts synchronously in the same process.
`ESDistributedRollouts`	Abstract base class of ES rollout distribution.
`ESAbortException`	This exception is raised if the current rollout is intentionally aborted.
`ESRolloutWorkerWrapper`	The rollout generation is bound to a single worker environment by implementing it as a Wrapper class.
`get_flat_parameters`	Get the parameters of all sub-policies as a single flat vector.
`set_flat_parameters`	Overwrite the parameters of all sub-policies by a single flat vector.

Imitation Learning (IL) and Learning from Demonstrations (LfD)¶

`ImitationEvents`	Event interface defining statistics emitted by the imitation learning trainers.
`BCRunner`	Dev runner for imitation learning.
`BCTrainer`	Trainer for behavioral cloning learning.
`BCAlgorithmConfig`	Algorithm parameters for behavioral cloning.
`BCValidationEvaluator`	Evaluates a given policy on validation data.
`BCLoss`	Loss function for behavioral cloning.

Utilities ¶

`stack_numpy_dict_list`	Stack list of dictionaries holding numpy arrays as values.
`unstack_numpy_list_dict`	Inverse of `stack_numpy_dict_list()`.
`compute_gradient_norm`	Computes the cumulative gradient norm of all provided parameters.
`stack_torch_dict_list`	Stack list of dictionaries holding torch tensors as values.

Read the Docs v: stable

Versions: latest; stable

Downloads: html; epub

On Read the Docs: Project Home; Builds