abstract_multi_environments
RL4CRN.environments.abstract_multi_environments
Multi-environment utilities for CRN reinforcement learning.
This module defines AbstractMultiEnvironments, a lightweight manager for
running multiple CRN environments in parallel (synchronously) and providing
common operations such as reset, stepping, observation/tensorization, and rich
rendering/logging of the best-performing environments.
Key features
- Batch stepping: applies one action per environment.
- Observation pipeline: uses an observer and tensorizer to produce a batch tensor suitable for an agent.
- Logging-oriented rendering: selects top-performing environments and logs plots/images to a provided logger.
- Hall of fame (optional): keeps a buffer of top environments and renders them alongside the current batch.
Notes
This class is intentionally "abstract" in the sense that it does not enforce a
specific environment implementation beyond the methods accessed in the code
(reset, step, render, and state access). It can be used directly if the
provided environments follow the expected interface.
AbstractMultiEnvironments
Synchronous manager for multiple CRN environments.
| PARAMETER | DESCRIPTION |
|---|---|
envs
|
List of environment instances. Each environment is expected to
expose:
-
|
hall_of_fame_size
|
Maximum number of environments stored in the hall of fame. If 0, hall of fame is disabled.
DEFAULT:
|
logger
|
Optional logger for metrics and figures/images. Expected to
provide methods such as
DEFAULT:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
envs |
List of managed environments.
|
hall_of_fame |
Optional
|
rendering_iteration |
Counter used to label logged figures/images.
|
reset()
Reset all environments.
| RETURNS | DESCRIPTION |
|---|---|
|
List of initial states for each environment (as returned by each
environment's |
gather()
Gather the current state from all environments.
| RETURNS | DESCRIPTION |
|---|---|
|
List of current environment states (typically IOCRN objects), one per environment. |
step(actions, stepper, raw_actions=None)
Step all environments forward by one action.
| PARAMETER | DESCRIPTION |
|---|---|
actions
|
List of environment actions to apply, one per environment.
|
stepper
|
Stepper object passed to each environment's
|
raw_actions
|
Optional list of raw policy actions aligned with
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
List of per-environment step outputs. The structure depends on the
underlying environment's |
Side Effects
Logs a timing metric 'Timing: Step' if a logger is available.
observe(observer, tensorizer)
Observe all environments and tensorize the resulting observations.
| PARAMETER | DESCRIPTION |
|---|---|
observer
|
Observer providing
|
tensorizer
|
Tensorizer providing
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tensor of stacked observations with shape |
render(rewards, n_best=1, disregarded_percentage=0.9, mode={'style': 'logger', 'task': 'transients', 'format': 'figure'})
Render and/or log diagnostics for the current batch of environments.
This method is primarily designed for logging workflows. It selects a set
of top environments according to rewards and logs a variety of plots,
depending on the requested mode.
Selection of top environments
Rewards are interpreted as losses (smaller is better), and the top subset is selected via:
where \(p\) is disregarded_percentage and \(N\) is the number
of environments.
| PARAMETER | DESCRIPTION |
|---|---|
rewards
|
Sequence of per-environment scalar scores. Smaller values are treated as better when selecting top environments.
|
n_best
|
Number of best environments to render individually via each
environment's
DEFAULT:
|
disregarded_percentage
|
Fraction of environments to discard when forming
the "top-k" subset for aggregate plots. For example, with
DEFAULT:
|
mode
|
Dictionary describing the rendering behavior. Expected keys:
-
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Side Effects
- Calls
env.render(...)on selected environments. - Logs figures/images/metrics via
self.logger(if present). - May create and close matplotlib figures.
- Increments
self.rendering_iterationon each call in logger mode.