serial_environments
RL4CRN.environments.serial_environments
Serial multi-environment evaluation.
This module defines SerialEnvironments, a subclass of
RL4CRN.environments.abstract_multi_environments.AbstractMultiEnvironments
that evaluates rewards for multiple environments sequentially (no parallelism).
This is useful for:
- debugging and deterministic profiling,
- small batch sizes where multiprocessing overhead dominates.
SerialEnvironments
Bases: AbstractMultiEnvironments
Multi-environment manager with serial reward evaluation.
__init__(envs, hall_of_fame_size, logger=None)
Initialize the serial environments wrapper.
| PARAMETER | DESCRIPTION |
|---|---|
envs
|
List of CRN environment instances.
|
hall_of_fame_size
|
Maximum number of environments stored in the hall of fame (0 disables hall of fame).
|
logger
|
Optional logger for metrics.
DEFAULT:
|
get_reward(routine)
Evaluate rewards for all environments sequentially.
The provided routine is applied to each environment's state in a Python
loop. The routine is expected to return a tuple (reward, task_info) for
each state.
Note
Unlike RL4CRN.environments.parallel_environments.ParallelEnvironments,
this method does not explicitly write task_info back into
env.state.last_task_info as it is expected to be handled by the routine itself (default).
| PARAMETER | DESCRIPTION |
|---|---|
routine
|
Callable taking an environment state and returning
|
| RETURNS | DESCRIPTION |
|---|---|
|
Sequence of reward values, one per environment. |
Side Effects
- Logs
'Reward Time'if a logger is available. - Adds all environments to the hall of fame (if enabled).
reset()
Reset all environments.
| RETURNS | DESCRIPTION |
|---|---|
|
List of initial states for each environment (as returned by each
environment's |
gather()
Gather the current state from all environments.
| RETURNS | DESCRIPTION |
|---|---|
|
List of current environment states (typically IOCRN objects), one per environment. |
step(actions, stepper, raw_actions=None)
Step all environments forward by one action.
| PARAMETER | DESCRIPTION |
|---|---|
actions
|
List of environment actions to apply, one per environment.
|
stepper
|
Stepper object passed to each environment's
|
raw_actions
|
Optional list of raw policy actions aligned with
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
List of per-environment step outputs. The structure depends on the
underlying environment's |
Side Effects
Logs a timing metric 'Timing: Step' if a logger is available.
observe(observer, tensorizer)
Observe all environments and tensorize the resulting observations.
| PARAMETER | DESCRIPTION |
|---|---|
observer
|
Observer providing
|
tensorizer
|
Tensorizer providing
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor of stacked observations with shape |
render(rewards, n_best=1, disregarded_percentage=0.9, mode={'style': 'logger', 'task': 'transients', 'format': 'figure'})
Render and/or log diagnostics for the current batch of environments.
This method is primarily designed for logging workflows. It selects a set
of top environments according to rewards and logs a variety of plots,
depending on the requested mode.
Selection of top environments
Rewards are interpreted as losses (smaller is better), and the top subset is selected via:
where \(p\) is disregarded_percentage and \(N\) is the number
of environments.
| PARAMETER | DESCRIPTION |
|---|---|
rewards
|
Sequence of per-environment scalar scores. Smaller values are treated as better when selecting top environments.
|
n_best
|
Number of best environments to render individually via each
environment's
DEFAULT:
|
disregarded_percentage
|
Fraction of environments to discard when forming
the "top-k" subset for aggregate plots. For example, with
DEFAULT:
|
mode
|
Dictionary describing the rendering behavior. Expected keys:
-
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Side Effects
- Calls
env.render(...)on selected environments. - Logs figures/images/metrics via
self.logger(if present). - May create and close matplotlib figures.
- Increments
self.rendering_iterationon each call in logger mode.