parallel_environments

`RL4CRN.environments.parallel_environments`

Parallel multi-environment evaluation.

This module defines ParallelEnvironments, a subclass of RL4CRN.environments.abstract_multi_environments.AbstractMultiEnvironments that evaluates rewards for multiple environments in parallel using joblib.

Only the reward computation is parallelized here (via get_reward). Stepping and observation remain synchronous and are inherited from the base class.

`ParallelEnvironments`

Bases: AbstractMultiEnvironments

Multi-environment manager with parallel reward evaluation.

PARAMETER	DESCRIPTION
`envs`	List of CRN environment instances.
`hall_of_fame_size`	Maximum number of environments stored in the hall of fame (0 disables hall of fame).
`N_CPUs`	Number of worker processes for parallel computation. Defaults to `os.cpu_count()`. DEFAULT: `cpu_count()`
`logger`	Optional logger for metrics. DEFAULT: `None`

ATTRIBUTE	DESCRIPTION
`N_CPUs`	Number of parallel workers used for reward computation.

`get_reward(routine)`

Evaluate rewards for all environments in parallel.

The provided routine is applied to each environment's state using joblib.Parallel. The routine is expected to return a tuple (reward, task_info) for a given state.

Because evaluation happens outside the environment objects, this method explicitly writes task_info back into each env.state.last_task_info.

PARAMETER	DESCRIPTION
`routine`	Callable taking an environment state and returning `(reward, task_info)`.

RETURNS	DESCRIPTION
	List of reward values, one per environment.

Side Effects

Sets env.state.last_task_info for each environment.
Logs 'Timing: Rewards' if a logger is available.
Adds all environments to the hall of fame (if enabled).

`reset()`

Reset all environments.

RETURNS	DESCRIPTION
	List of initial states for each environment (as returned by each environment's `reset()` method).

`gather()`

Gather the current state from all environments.

RETURNS	DESCRIPTION
	List of current environment states (typically IOCRN objects), one per environment.

`step(actions, stepper, raw_actions=None)`

Step all environments forward by one action.

PARAMETER	DESCRIPTION
`actions`	List of environment actions to apply, one per environment.
`stepper`	Stepper object passed to each environment's `step(...)`.
`raw_actions`	Optional list of raw policy actions aligned with `actions`. If provided, each environment is called as `env.step(action, stepper, raw_action=raw_action)`. DEFAULT: `None`

RETURNS	DESCRIPTION
	List of per-environment step outputs. The structure depends on the underlying environment's `step` method (commonly `(state, done)` or similar).

Side Effects

Logs a timing metric 'Timing: Step' if a logger is available.

`observe(observer, tensorizer)`

Observe all environments and tensorize the resulting observations.

PARAMETER	DESCRIPTION
`observer`	Observer providing `observe(state)` and returning an observation object per environment.
`tensorizer`	Tensorizer providing `tensorize(observation)` and returning a tensor per environment (or at least stackable tensors).

RETURNS	DESCRIPTION
	Tensor of stacked observations with shape `(N, ...)`, where `N` is the number of environments.

`render(rewards, n_best=1, disregarded_percentage=0.9, mode={'style': 'logger', 'task': 'transients', 'format': 'figure'})`

Render and/or log diagnostics for the current batch of environments.

This method is primarily designed for logging workflows. It selects a set of top environments according to rewards and logs a variety of plots, depending on the requested mode.

Selection of top environments

Rewards are interpreted as losses (smaller is better), and the top subset is selected via:

\[k = \left\lfloor N (1 - p) \right\rfloor.\]

where \(p\) is disregarded_percentage and \(N\) is the number of environments.

PARAMETER	DESCRIPTION
`rewards`	Sequence of per-environment scalar scores. Smaller values are treated as better when selecting top environments.
`n_best`	Number of best environments to render individually via each environment's `render(...)`. DEFAULT: `1`
`disregarded_percentage`	Fraction of environments to discard when forming the "top-k" subset for aggregate plots. For example, with `disregarded_percentage=0.9`, only the best 10% are considered. DEFAULT: `0.9`
`mode`	Dictionary describing the rendering behavior. Expected keys: - `style`: `'logger'` (supported here) or `'human'` (not implemented in this method). - `task`: controls what diagnostics to log. Supported values in this method include (depending on code paths): `'transients'`, `'transients + dose-response'`, `'phase_plot'`, `'rank'`, `'transients + frequency content'`, `'transients + logic'`, `'SSA_transients'`. - `format`: `'figure'` to log matplotlib figures directly, or `'image'` to log PNG buffers. Additional optional keys may be used by specific tasks: - `bounds`, `bounds_freq`, `scale`, `t0`, `topology`. DEFAULT: `{'style': 'logger', 'task': 'transients', 'format': 'figure'}`

RETURNS	DESCRIPTION
	None.

Side Effects

Calls env.render(...) on selected environments.
Logs figures/images/metrics via self.logger (if present).
May create and close matplotlib figures.
Increments self.rendering_iteration on each call in logger mode.