serial_environments

`RL4CRN.environments.serial_environments`

Serial multi-environment evaluation.

This module defines SerialEnvironments, a subclass of RL4CRN.environments.abstract_multi_environments.AbstractMultiEnvironments that evaluates rewards for multiple environments sequentially (no parallelism).

This is useful for:

debugging and deterministic profiling,
small batch sizes where multiprocessing overhead dominates.

`SerialEnvironments`

Bases: AbstractMultiEnvironments

Multi-environment manager with serial reward evaluation.

`init(envs, hall_of_fame_size, logger=None)`

Initialize the serial environments wrapper.

PARAMETER	DESCRIPTION
`envs`	List of CRN environment instances.
`hall_of_fame_size`	Maximum number of environments stored in the hall of fame (0 disables hall of fame).
`logger`	Optional logger for metrics. DEFAULT: `None`

`get_reward(routine)`

Evaluate rewards for all environments sequentially.

The provided routine is applied to each environment's state in a Python loop. The routine is expected to return a tuple (reward, task_info) for each state.

Note

Unlike RL4CRN.environments.parallel_environments.ParallelEnvironments, this method does not explicitly write task_info back into env.state.last_task_info as it is expected to be handled by the routine itself (default).

PARAMETER	DESCRIPTION
`routine`	Callable taking an environment state and returning `(reward, task_info)`.

RETURNS	DESCRIPTION
	Sequence of reward values, one per environment.

Side Effects

Logs 'Reward Time' if a logger is available.
Adds all environments to the hall of fame (if enabled).

`reset()`

Reset all environments.

RETURNS	DESCRIPTION
	List of initial states for each environment (as returned by each environment's `reset()` method).

`gather()`

Gather the current state from all environments.

RETURNS	DESCRIPTION
	List of current environment states (typically IOCRN objects), one per environment.

`step(actions, stepper, raw_actions=None)`

Step all environments forward by one action.

PARAMETER	DESCRIPTION
`actions`	List of environment actions to apply, one per environment.
`stepper`	Stepper object passed to each environment's `step(...)`.
`raw_actions`	Optional list of raw policy actions aligned with `actions`. If provided, each environment is called as `env.step(action, stepper, raw_action=raw_action)`. DEFAULT: `None`

RETURNS	DESCRIPTION
	List of per-environment step outputs. The structure depends on the underlying environment's `step` method (commonly `(state, done)` or similar).

Side Effects

Logs a timing metric 'Timing: Step' if a logger is available.

`observe(observer, tensorizer)`

Observe all environments and tensorize the resulting observations.

PARAMETER	DESCRIPTION
`observer`	Observer providing `observe(state)` and returning an observation object per environment.
`tensorizer`	Tensorizer providing `tensorize(observation)` and returning a tensor per environment (or at least stackable tensors).

RETURNS	DESCRIPTION
	Tensor of stacked observations with shape `(N, ...)`, where `N` is the number of environments.

`render(rewards, n_best=1, disregarded_percentage=0.9, mode={'style': 'logger', 'task': 'transients', 'format': 'figure'})`

Render and/or log diagnostics for the current batch of environments.

This method is primarily designed for logging workflows. It selects a set of top environments according to rewards and logs a variety of plots, depending on the requested mode.

Selection of top environments

Rewards are interpreted as losses (smaller is better), and the top subset is selected via:

\[k = \left\lfloor N (1 - p) \right\rfloor.\]

where \(p\) is disregarded_percentage and \(N\) is the number of environments.

PARAMETER	DESCRIPTION
`rewards`	Sequence of per-environment scalar scores. Smaller values are treated as better when selecting top environments.
`n_best`	Number of best environments to render individually via each environment's `render(...)`. DEFAULT: `1`
`disregarded_percentage`	Fraction of environments to discard when forming the "top-k" subset for aggregate plots. For example, with `disregarded_percentage=0.9`, only the best 10% are considered. DEFAULT: `0.9`
`mode`	Dictionary describing the rendering behavior. Expected keys: - `style`: `'logger'` (supported here) or `'human'` (not implemented in this method). - `task`: controls what diagnostics to log. Supported values in this method include (depending on code paths): `'transients'`, `'transients + dose-response'`, `'phase_plot'`, `'rank'`, `'transients + frequency content'`, `'transients + logic'`, `'SSA_transients'`. - `format`: `'figure'` to log matplotlib figures directly, or `'image'` to log PNG buffers. Additional optional keys may be used by specific tasks: - `bounds`, `bounds_freq`, `scale`, `t0`, `topology`. DEFAULT: `{'style': 'logger', 'task': 'transients', 'format': 'figure'}`

RETURNS	DESCRIPTION
	None.

Side Effects

Calls env.render(...) on selected environments.
Logs figures/images/metrics via self.logger (if present).
May create and close matplotlib figures.
Increments self.rendering_iteration on each call in logger mode.

serial_environments

RL4CRN.environments.serial_environments

SerialEnvironments

__init__(envs, hall_of_fame_size, logger=None)

get_reward(routine)

reset()

gather()

step(actions, stepper, raw_actions=None)

observe(observer, tensorizer)

render(rewards, n_best=1, disregarded_percentage=0.9, mode={'style': 'logger', 'task': 'transients', 'format': 'figure'})

`RL4CRN.environments.serial_environments`

`SerialEnvironments`

`init(envs, hall_of_fame_size, logger=None)`

`get_reward(routine)`

`reset()`

`gather()`

`step(actions, stepper, raw_actions=None)`

`observe(observer, tensorizer)`

`render(rewards, n_best=1, disregarded_percentage=0.9, mode={'style': 'logger', 'task': 'transients', 'format': 'figure'})`