Skip to content

abstract_multi_environments

RL4CRN.environments.abstract_multi_environments

Multi-environment utilities for CRN reinforcement learning.

This module defines AbstractMultiEnvironments, a lightweight manager for running multiple CRN environments in parallel (synchronously) and providing common operations such as reset, stepping, observation/tensorization, and rich rendering/logging of the best-performing environments.

Key features
  • Batch stepping: applies one action per environment.
  • Observation pipeline: uses an observer and tensorizer to produce a batch tensor suitable for an agent.
  • Logging-oriented rendering: selects top-performing environments and logs plots/images to a provided logger.
  • Hall of fame (optional): keeps a buffer of top environments and renders them alongside the current batch.
Notes

This class is intentionally "abstract" in the sense that it does not enforce a specific environment implementation beyond the methods accessed in the code (reset, step, render, and state access). It can be used directly if the provided environments follow the expected interface.

AbstractMultiEnvironments

Synchronous manager for multiple CRN environments.

PARAMETER DESCRIPTION
envs

List of environment instances. Each environment is expected to expose: - reset() - step(action, stepper, raw_action=None) (raw_action optional) - render(mode=..., ID=...) - state attribute (IOCRN-like), used by observers and plotting code

hall_of_fame_size

Maximum number of environments stored in the hall of fame. If 0, hall of fame is disabled.

DEFAULT: 10

logger

Optional logger for metrics and figures/images. Expected to provide methods such as log_metric, log_figure, and log_image.

DEFAULT: None

ATTRIBUTE DESCRIPTION
envs

List of managed environments.

hall_of_fame

Optional RL4CRN.utils.hall_of_fame.HallOfFame storing best-performing environments.

rendering_iteration

Counter used to label logged figures/images.

reset()

Reset all environments.

RETURNS DESCRIPTION

List of initial states for each environment (as returned by each environment's reset() method).

gather()

Gather the current state from all environments.

RETURNS DESCRIPTION

List of current environment states (typically IOCRN objects), one per environment.

step(actions, stepper, raw_actions=None)

Step all environments forward by one action.

PARAMETER DESCRIPTION
actions

List of environment actions to apply, one per environment.

stepper

Stepper object passed to each environment's step(...).

raw_actions

Optional list of raw policy actions aligned with actions. If provided, each environment is called as env.step(action, stepper, raw_action=raw_action).

DEFAULT: None

RETURNS DESCRIPTION

List of per-environment step outputs. The structure depends on the underlying environment's step method (commonly (state, done) or similar).

Side Effects

Logs a timing metric 'Timing: Step' if a logger is available.

observe(observer, tensorizer)

Observe all environments and tensorize the resulting observations.

PARAMETER DESCRIPTION
observer

Observer providing observe(state) and returning an observation object per environment.

tensorizer

Tensorizer providing tensorize(observation) and returning a tensor per environment (or at least stackable tensors).

RETURNS DESCRIPTION

Tensor of stacked observations with shape (N, ...), where N is the number of environments.

render(rewards, n_best=1, disregarded_percentage=0.9, mode={'style': 'logger', 'task': 'transients', 'format': 'figure'})

Render and/or log diagnostics for the current batch of environments.

This method is primarily designed for logging workflows. It selects a set of top environments according to rewards and logs a variety of plots, depending on the requested mode.

Selection of top environments

Rewards are interpreted as losses (smaller is better), and the top subset is selected via:

\[k = \left\lfloor N (1 - p) \right\rfloor.\]

where \(p\) is disregarded_percentage and \(N\) is the number of environments.

PARAMETER DESCRIPTION
rewards

Sequence of per-environment scalar scores. Smaller values are treated as better when selecting top environments.

n_best

Number of best environments to render individually via each environment's render(...).

DEFAULT: 1

disregarded_percentage

Fraction of environments to discard when forming the "top-k" subset for aggregate plots. For example, with disregarded_percentage=0.9, only the best 10% are considered.

DEFAULT: 0.9

mode

Dictionary describing the rendering behavior. Expected keys: - style: 'logger' (supported here) or 'human' (not implemented in this method). - task: controls what diagnostics to log. Supported values in this method include (depending on code paths): 'transients', 'transients + dose-response', 'phase_plot', 'rank', 'transients + frequency content', 'transients + logic', 'SSA_transients'. - format: 'figure' to log matplotlib figures directly, or 'image' to log PNG buffers. Additional optional keys may be used by specific tasks: - bounds, bounds_freq, scale, t0, topology.

DEFAULT: {'style': 'logger', 'task': 'transients', 'format': 'figure'}

RETURNS DESCRIPTION

None.

Side Effects
  • Calls env.render(...) on selected environments.
  • Logs figures/images/metrics via self.logger (if present).
  • May create and close matplotlib figures.
  • Increments self.rendering_iteration on each call in logger mode.