Skip to content

environment

RL4CRN.environments.environment

Single-environment wrapper for CRN design via reinforcement learning.

This module defines Environment, a lightweight environment wrapper with a Gym-like interface tailored to chemical reaction networks (CRNs). The environment maintains a mutable CRN state initialized from a template and allows an agent to add reactions up to a fixed budget.

Core loop
  • reset clones the CRN template into the current state.
  • step applies an action to the current state via a provided stepper and increments the reaction budget counter.
  • The environment returns (state, done) where done indicates whether the maximum number of added reactions has been reached.
Action semantics

The environment itself does not interpret actions. Instead, it delegates state updates to a stepper object with a step(state, action) method (see RL4CRN.agent2env_interface.abstract_stepper.AbstractStepper).

Logging and rendering

render supports a number of plotting/logging tasks driven by a mode dictionary. In 'logger' mode, plots are logged via the provided logger as either figures or PNG images.

Environment

Gym-like CRN environment based on adding reactions to a template.

PARAMETER DESCRIPTION
crn_template

CRN object used as the initial template. Must provide clone() and should expose plotting methods used by render (e.g., plot_transient_response, plot_phase_portrait, etc.).

max_added_reactions

Maximum number of reactions that can be added before the environment signals termination (done=True).

logger

Optional logger used by render in 'logger' mode. Expected to provide methods such as log_text, log_figure, and log_image.

DEFAULT: None

logger_schedule

Frequency of logging updates (stored for downstream use; not actively enforced in the current implementation).

DEFAULT: 1

ATTRIBUTE DESCRIPTION
state

Current CRN state (a clone of crn_template, then mutated).

num_added_reactions

Number of actions applied since last reset.

actions_taken

List of environment actions applied via step.

raw_actions_taken

Optional list of raw policy actions (if provided to step).

clone()

Create a deep copy of the environment.

The clone includes
  • a clone of the template and current state,
  • the current reaction count,
  • copies of actions_taken and raw_actions_taken.
RETURNS DESCRIPTION

A new Environment instance with duplicated internal state.

reset()

Reset the environment state to the template.

This clears the reaction counter and stored action histories.

RETURNS DESCRIPTION

The reset CRN state (clone of the template).

step(action, stepper, raw_action=None)

Apply an action to the CRN state via a stepper.

The stepper is responsible for mutating the current state given the action. After applying the action, the environment increments num_added_reactions and returns a termination flag indicating whether the reaction budget is exhausted.

PARAMETER DESCRIPTION
action

Environment action to apply (typically a reaction or a reaction-like object).

stepper

Object providing step(state, action) that mutates the state in-place.

raw_action

Optional raw policy action (stored in raw_actions_taken if provided). Useful for algorithms that require access to the policy outputs (e.g., self-imitation learning).

DEFAULT: None

RETURNS DESCRIPTION

Tuple (state, done) where: - state is the updated CRN state. - done is True if num_added_reactions >= max_added_reactions.

get_action(index)

Return the environment action taken at a given step index.

PARAMETER DESCRIPTION
index

Index into actions_taken.

RETURNS DESCRIPTION

The action stored at the specified index.

RAISES DESCRIPTION
IndexError

If index is out of range.

get_raw_action(index)

Return the raw policy action stored at a given step index.

PARAMETER DESCRIPTION
index

Index into raw_actions_taken.

RETURNS DESCRIPTION

The raw action stored at the specified index.

RAISES DESCRIPTION
IndexError

If index is out of range.

get_reward(routine)

Compute a reward (or loss) for the current CRN state.

The routine is expected to evaluate the current state and return a structure whose first element is a tuple (reward, last_task_info). This method extracts the scalar reward component and returns it.

PARAMETER DESCRIPTION
routine

Callable taking the current state and returning an iterable whose first element is (reward, last_task_info).

RETURNS DESCRIPTION

Scalar reward value (as produced by routine).

Notes

The last_task_info returned by the routine is not propagated by this method. In most workflows it is expected to be stored inside self.state.last_task_info by the routine/state implementation.

render(mode={'style': 'human', 'task': 'transients', 'format': 'figure'}, ID=None)

Render or log diagnostics for the current CRN state.

Rendering behavior is controlled by a mode dictionary. Supported cases include interactive plotting ('human') and logging plots to a logger ('logger').

PARAMETER DESCRIPTION
mode

Dictionary describing rendering behavior. Typical keys:

  • style: 'human' or 'logger'.
  • task: Diagnostic task. Supported values in this implementation include 'transients', 'phase_plot', 'rank', 'transients + dose-response', 'transients + frequency content', 'transients + logic', 'SSA_transients'.
  • format: 'figure' to log matplotlib figures directly, or 'image' to log PNG buffers. Some tasks also consume optional keys such as t0, bounds_freq, or scale.

DEFAULT: {'style': 'human', 'task': 'transients', 'format': 'figure'}

ID

Optional identifier used when naming logged artifacts.

DEFAULT: None

RETURNS DESCRIPTION

None.

Side Effects
  • In 'human' mode, opens matplotlib windows (depending on backend).
  • In 'logger' mode, logs text/figures/images via self.logger.
  • Creates and closes matplotlib figures.