Skip to content

serial_environments

RL4CRN.environments.serial_environments

Serial multi-environment evaluation.

This module defines SerialEnvironments, a subclass of RL4CRN.environments.abstract_multi_environments.AbstractMultiEnvironments that evaluates rewards for multiple environments sequentially (no parallelism).

This is useful for:

  • debugging and deterministic profiling,
  • small batch sizes where multiprocessing overhead dominates.

SerialEnvironments

Bases: AbstractMultiEnvironments

Multi-environment manager with serial reward evaluation.

__init__(envs, hall_of_fame_size, logger=None)

Initialize the serial environments wrapper.

PARAMETER DESCRIPTION
envs

List of CRN environment instances.

hall_of_fame_size

Maximum number of environments stored in the hall of fame (0 disables hall of fame).

logger

Optional logger for metrics.

DEFAULT: None

get_reward(routine)

Evaluate rewards for all environments sequentially.

The provided routine is applied to each environment's state in a Python loop. The routine is expected to return a tuple (reward, task_info) for each state.

Note

Unlike RL4CRN.environments.parallel_environments.ParallelEnvironments, this method does not explicitly write task_info back into env.state.last_task_info as it is expected to be handled by the routine itself (default).

PARAMETER DESCRIPTION
routine

Callable taking an environment state and returning (reward, task_info).

RETURNS DESCRIPTION

Sequence of reward values, one per environment.

Side Effects
  • Logs 'Reward Time' if a logger is available.
  • Adds all environments to the hall of fame (if enabled).

reset()

Reset all environments.

RETURNS DESCRIPTION

List of initial states for each environment (as returned by each environment's reset() method).

gather()

Gather the current state from all environments.

RETURNS DESCRIPTION

List of current environment states (typically IOCRN objects), one per environment.

step(actions, stepper, raw_actions=None)

Step all environments forward by one action.

PARAMETER DESCRIPTION
actions

List of environment actions to apply, one per environment.

stepper

Stepper object passed to each environment's step(...).

raw_actions

Optional list of raw policy actions aligned with actions. If provided, each environment is called as env.step(action, stepper, raw_action=raw_action).

DEFAULT: None

RETURNS DESCRIPTION

List of per-environment step outputs. The structure depends on the underlying environment's step method (commonly (state, done) or similar).

Side Effects

Logs a timing metric 'Timing: Step' if a logger is available.

observe(observer, tensorizer)

Observe all environments and tensorize the resulting observations.

PARAMETER DESCRIPTION
observer

Observer providing observe(state) and returning an observation object per environment.

tensorizer

Tensorizer providing tensorize(observation) and returning a tensor per environment (or at least stackable tensors).

RETURNS DESCRIPTION

Tensor of stacked observations with shape (N, ...), where N is the number of environments.

render(rewards, n_best=1, disregarded_percentage=0.9, mode={'style': 'logger', 'task': 'transients', 'format': 'figure'})

Render and/or log diagnostics for the current batch of environments.

This method is primarily designed for logging workflows. It selects a set of top environments according to rewards and logs a variety of plots, depending on the requested mode.

Selection of top environments

Rewards are interpreted as losses (smaller is better), and the top subset is selected via:

\[k = \left\lfloor N (1 - p) \right\rfloor.\]

where \(p\) is disregarded_percentage and \(N\) is the number of environments.

PARAMETER DESCRIPTION
rewards

Sequence of per-environment scalar scores. Smaller values are treated as better when selecting top environments.

n_best

Number of best environments to render individually via each environment's render(...).

DEFAULT: 1

disregarded_percentage

Fraction of environments to discard when forming the "top-k" subset for aggregate plots. For example, with disregarded_percentage=0.9, only the best 10% are considered.

DEFAULT: 0.9

mode

Dictionary describing the rendering behavior. Expected keys: - style: 'logger' (supported here) or 'human' (not implemented in this method). - task: controls what diagnostics to log. Supported values in this method include (depending on code paths): 'transients', 'transients + dose-response', 'phase_plot', 'rank', 'transients + frequency content', 'transients + logic', 'SSA_transients'. - format: 'figure' to log matplotlib figures directly, or 'image' to log PNG buffers. Additional optional keys may be used by specific tasks: - bounds, bounds_freq, scale, t0, topology.

DEFAULT: {'style': 'logger', 'task': 'transients', 'format': 'figure'}

RETURNS DESCRIPTION

None.

Side Effects
  • Calls env.render(...) on selected environments.
  • Logs figures/images/metrics via self.logger (if present).
  • May create and close matplotlib figures.
  • Increments self.rendering_iteration on each call in logger mode.