environment
RL4CRN.environments.environment
Single-environment wrapper for CRN design via reinforcement learning.
This module defines Environment, a lightweight environment wrapper with
a Gym-like interface tailored to chemical reaction networks (CRNs). The
environment maintains a mutable CRN state initialized from a template and allows
an agent to add reactions up to a fixed budget.
Core loop
resetclones the CRN template into the current state.stepapplies an action to the current state via a provided stepper and increments the reaction budget counter.- The environment returns
(state, done)wheredoneindicates whether the maximum number of added reactions has been reached.
Action semantics
The environment itself does not interpret actions. Instead, it delegates
state updates to a stepper object with a step(state, action) method
(see RL4CRN.agent2env_interface.abstract_stepper.AbstractStepper).
Logging and rendering
render supports a number of plotting/logging tasks driven by a
mode dictionary. In 'logger' mode, plots are logged via the provided
logger as either figures or PNG images.
Environment
Gym-like CRN environment based on adding reactions to a template.
| PARAMETER | DESCRIPTION |
|---|---|
crn_template
|
CRN object used as the initial template. Must provide
|
max_added_reactions
|
Maximum number of reactions that can be added before
the environment signals termination (
|
logger
|
Optional logger used by
DEFAULT:
|
logger_schedule
|
Frequency of logging updates (stored for downstream use; not actively enforced in the current implementation).
DEFAULT:
|
| ATTRIBUTE | DESCRIPTION |
|---|---|
state |
Current CRN state (a clone of
|
num_added_reactions |
Number of actions applied since last reset.
|
actions_taken |
List of environment actions applied via
|
raw_actions_taken |
Optional list of raw policy actions (if provided to
|
clone()
Create a deep copy of the environment.
The clone includes
- a clone of the template and current state,
- the current reaction count,
- copies of
actions_takenandraw_actions_taken.
| RETURNS | DESCRIPTION |
|---|---|
|
A new |
reset()
Reset the environment state to the template.
This clears the reaction counter and stored action histories.
| RETURNS | DESCRIPTION |
|---|---|
|
The reset CRN state (clone of the template). |
step(action, stepper, raw_action=None)
Apply an action to the CRN state via a stepper.
The stepper is responsible for mutating the current state given the
action. After applying the action, the environment increments
num_added_reactions and returns a termination flag indicating whether
the reaction budget is exhausted.
| PARAMETER | DESCRIPTION |
|---|---|
action
|
Environment action to apply (typically a reaction or a reaction-like object).
|
stepper
|
Object providing
|
raw_action
|
Optional raw policy action (stored in
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Tuple |
get_action(index)
Return the environment action taken at a given step index.
| PARAMETER | DESCRIPTION |
|---|---|
index
|
Index into
|
| RETURNS | DESCRIPTION |
|---|---|
|
The action stored at the specified index. |
| RAISES | DESCRIPTION |
|---|---|
IndexError
|
If |
get_raw_action(index)
Return the raw policy action stored at a given step index.
| PARAMETER | DESCRIPTION |
|---|---|
index
|
Index into
|
| RETURNS | DESCRIPTION |
|---|---|
|
The raw action stored at the specified index. |
| RAISES | DESCRIPTION |
|---|---|
IndexError
|
If |
get_reward(routine)
Compute a reward (or loss) for the current CRN state.
The routine is expected to evaluate the current state and return a
structure whose first element is a tuple (reward, last_task_info).
This method extracts the scalar reward component and returns it.
| PARAMETER | DESCRIPTION |
|---|---|
routine
|
Callable taking the current state and returning an iterable
whose first element is
|
| RETURNS | DESCRIPTION |
|---|---|
|
Scalar reward value (as produced by |
Notes
The last_task_info returned by the routine is not propagated by this
method. In most workflows it is expected to be stored inside
self.state.last_task_info by the routine/state implementation.
render(mode={'style': 'human', 'task': 'transients', 'format': 'figure'}, ID=None)
Render or log diagnostics for the current CRN state.
Rendering behavior is controlled by a mode dictionary. Supported cases
include interactive plotting ('human') and logging plots to a logger
('logger').
| PARAMETER | DESCRIPTION |
|---|---|
mode
|
Dictionary describing rendering behavior. Typical keys:
DEFAULT:
|
ID
|
Optional identifier used when naming logged artifacts.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
None. |
Side Effects
- In
'human'mode, opens matplotlib windows (depending on backend). - In
'logger'mode, logs text/figures/images viaself.logger. - Creates and closes matplotlib figures.