Workflow
Policy and Value Networks
Training
Concepts and Structure
Environment Customization
Best Practices and Tutorials
Logging and Monitoring
Scaling the Training Process
maze.train.trainers.common.model_selection.model_selection_base.
ModelSelectionBase
Base class for model selection strategies.
update
Receives a new evaluation result from the model. Should be only called once per epoch.
reward – mean evaluation reward.