parallel_environments
RL4CRN.environments.parallel_environments
Parallel multi-environment evaluation.
This module defines ParallelEnvironments, a subclass of
RL4CRN.environments.abstract_multi_environments.AbstractMultiEnvironments
that evaluates rewards for multiple environments in parallel using
joblib.
Only the reward computation is parallelized here (via get_reward).
Stepping and observation remain synchronous and are inherited from the base
class.
ParallelEnvironments
Bases: AbstractMultiEnvironments
Multi-environment manager with parallel reward evaluation.
| PARAMETER | DESCRIPTION |
|---|---|
envs
|
List of CRN environment instances.
|
hall_of_fame_size
|
Maximum number of environments stored in the hall of fame (0 disables hall of fame).
|
N_CPUs
|
Number of worker processes for parallel computation. Defaults to
DEFAULT:
|
logger
|
Optional logger for metrics.
DEFAULT:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
N_CPUs |
Number of parallel workers used for reward computation.
|
get_reward(routine)
Evaluate rewards for all environments in parallel.
The provided routine is applied to each environment's state using
joblib.Parallel. The routine is expected to return a tuple
(reward, task_info) for a given state.
Because evaluation happens outside the environment objects, this method
explicitly writes task_info back into each env.state.last_task_info.
| PARAMETER | DESCRIPTION |
|---|---|
routine
|
Callable taking an environment state and returning
|
| RETURNS | DESCRIPTION |
|---|---|
|
List of reward values, one per environment. |
Side Effects
- Sets
env.state.last_task_infofor each environment. - Logs
'Timing: Rewards'if a logger is available. - Adds all environments to the hall of fame (if enabled).
reset()
Reset all environments.
| RETURNS | DESCRIPTION |
|---|---|
|
List of initial states for each environment (as returned by each
environment's |
gather()
Gather the current state from all environments.
| RETURNS | DESCRIPTION |
|---|---|
|
List of current environment states (typically IOCRN objects), one per environment. |
step(actions, stepper, raw_actions=None)
Step all environments forward by one action.
| PARAMETER | DESCRIPTION |
|---|---|
actions
|
List of environment actions to apply, one per environment.
|
stepper
|
Stepper object passed to each environment's
|
raw_actions
|
Optional list of raw policy actions aligned with
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
List of per-environment step outputs. The structure depends on the
underlying environment's |
Side Effects
Logs a timing metric 'Timing: Step' if a logger is available.
observe(observer, tensorizer)
Observe all environments and tensorize the resulting observations.
| PARAMETER | DESCRIPTION |
|---|---|
observer
|
Observer providing
|
tensorizer
|
Tensorizer providing
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor of stacked observations with shape |
render(rewards, n_best=1, disregarded_percentage=0.9, mode={'style': 'logger', 'task': 'transients', 'format': 'figure'})
Render and/or log diagnostics for the current batch of environments.
This method is primarily designed for logging workflows. It selects a set
of top environments according to rewards and logs a variety of plots,
depending on the requested mode.
Selection of top environments
Rewards are interpreted as losses (smaller is better), and the top subset is selected via:
where \(p\) is disregarded_percentage and \(N\) is the number
of environments.
| PARAMETER | DESCRIPTION |
|---|---|
rewards
|
Sequence of per-environment scalar scores. Smaller values are treated as better when selecting top environments.
|
n_best
|
Number of best environments to render individually via each
environment's
DEFAULT:
|
disregarded_percentage
|
Fraction of environments to discard when forming
the "top-k" subset for aggregate plots. For example, with
DEFAULT:
|
mode
|
Dictionary describing the rendering behavior. Expected keys:
-
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Side Effects
- Calls
env.render(...)on selected environments. - Logs figures/images/metrics via
self.logger(if present). - May create and close matplotlib figures.
- Increments
self.rendering_iterationon each call in logger mode.