Skip to content

parallel_environments

RL4CRN.environments.parallel_environments

Parallel multi-environment evaluation.

This module defines ParallelEnvironments, a subclass of RL4CRN.environments.abstract_multi_environments.AbstractMultiEnvironments that evaluates rewards for multiple environments in parallel using joblib.

Only the reward computation is parallelized here (via get_reward). Stepping and observation remain synchronous and are inherited from the base class.

ParallelEnvironments

Bases: AbstractMultiEnvironments

Multi-environment manager with parallel reward evaluation.

PARAMETER DESCRIPTION
envs

List of CRN environment instances.

hall_of_fame_size

Maximum number of environments stored in the hall of fame (0 disables hall of fame).

N_CPUs

Number of worker processes for parallel computation. Defaults to os.cpu_count().

DEFAULT: cpu_count()

logger

Optional logger for metrics.

DEFAULT: None

ATTRIBUTE DESCRIPTION
N_CPUs

Number of parallel workers used for reward computation.

get_reward(routine)

Evaluate rewards for all environments in parallel.

The provided routine is applied to each environment's state using joblib.Parallel. The routine is expected to return a tuple (reward, task_info) for a given state.

Because evaluation happens outside the environment objects, this method explicitly writes task_info back into each env.state.last_task_info.

PARAMETER DESCRIPTION
routine

Callable taking an environment state and returning (reward, task_info).

RETURNS DESCRIPTION

List of reward values, one per environment.

Side Effects
  • Sets env.state.last_task_info for each environment.
  • Logs 'Timing: Rewards' if a logger is available.
  • Adds all environments to the hall of fame (if enabled).

reset()

Reset all environments.

RETURNS DESCRIPTION

List of initial states for each environment (as returned by each environment's reset() method).

gather()

Gather the current state from all environments.

RETURNS DESCRIPTION

List of current environment states (typically IOCRN objects), one per environment.

step(actions, stepper, raw_actions=None)

Step all environments forward by one action.

PARAMETER DESCRIPTION
actions

List of environment actions to apply, one per environment.

stepper

Stepper object passed to each environment's step(...).

raw_actions

Optional list of raw policy actions aligned with actions. If provided, each environment is called as env.step(action, stepper, raw_action=raw_action).

DEFAULT: None

RETURNS DESCRIPTION

List of per-environment step outputs. The structure depends on the underlying environment's step method (commonly (state, done) or similar).

Side Effects

Logs a timing metric 'Timing: Step' if a logger is available.

observe(observer, tensorizer)

Observe all environments and tensorize the resulting observations.

PARAMETER DESCRIPTION
observer

Observer providing observe(state) and returning an observation object per environment.

tensorizer

Tensorizer providing tensorize(observation) and returning a tensor per environment (or at least stackable tensors).

RETURNS DESCRIPTION

Tensor of stacked observations with shape (N, ...), where N is the number of environments.

render(rewards, n_best=1, disregarded_percentage=0.9, mode={'style': 'logger', 'task': 'transients', 'format': 'figure'})

Render and/or log diagnostics for the current batch of environments.

This method is primarily designed for logging workflows. It selects a set of top environments according to rewards and logs a variety of plots, depending on the requested mode.

Selection of top environments

Rewards are interpreted as losses (smaller is better), and the top subset is selected via:

\[k = \left\lfloor N (1 - p) \right\rfloor.\]

where \(p\) is disregarded_percentage and \(N\) is the number of environments.

PARAMETER DESCRIPTION
rewards

Sequence of per-environment scalar scores. Smaller values are treated as better when selecting top environments.

n_best

Number of best environments to render individually via each environment's render(...).

DEFAULT: 1

disregarded_percentage

Fraction of environments to discard when forming the "top-k" subset for aggregate plots. For example, with disregarded_percentage=0.9, only the best 10% are considered.

DEFAULT: 0.9

mode

Dictionary describing the rendering behavior. Expected keys: - style: 'logger' (supported here) or 'human' (not implemented in this method). - task: controls what diagnostics to log. Supported values in this method include (depending on code paths): 'transients', 'transients + dose-response', 'phase_plot', 'rank', 'transients + frequency content', 'transients + logic', 'SSA_transients'. - format: 'figure' to log matplotlib figures directly, or 'image' to log PNG buffers. Additional optional keys may be used by specific tasks: - bounds, bounds_freq, scale, t0, topology.

DEFAULT: {'style': 'logger', 'task': 'transients', 'format': 'figure'}

RETURNS DESCRIPTION

None.

Side Effects
  • Calls env.render(...) on selected environments.
  • Logs figures/images/metrics via self.logger (if present).
  • May create and close matplotlib figures.
  • Increments self.rendering_iteration on each call in logger mode.