environment

`RL4CRN.environments.environment`

Single-environment wrapper for CRN design via reinforcement learning.

This module defines Environment, a lightweight environment wrapper with a Gym-like interface tailored to chemical reaction networks (CRNs). The environment maintains a mutable CRN state initialized from a template and allows an agent to add reactions up to a fixed budget.

Core loop

reset clones the CRN template into the current state.
step applies an action to the current state via a provided stepper and increments the reaction budget counter.
The environment returns (state, done) where done indicates whether the maximum number of added reactions has been reached.

Action semantics

The environment itself does not interpret actions. Instead, it delegates state updates to a stepper object with a step(state, action) method (see RL4CRN.agent2env_interface.abstract_stepper.AbstractStepper).

Logging and rendering

render supports a number of plotting/logging tasks driven by a mode dictionary. In 'logger' mode, plots are logged via the provided logger as either figures or PNG images.

`Environment`

Gym-like CRN environment based on adding reactions to a template.

PARAMETER	DESCRIPTION
`crn_template`	CRN object used as the initial template. Must provide `clone()` and should expose plotting methods used by `render` (e.g., `plot_transient_response`, `plot_phase_portrait`, etc.).
`max_added_reactions`	Maximum number of reactions that can be added before the environment signals termination (`done=True`).
`logger`	Optional logger used by `render` in `'logger'` mode. Expected to provide methods such as `log_text`, `log_figure`, and `log_image`. DEFAULT: `None`
`logger_schedule`	Frequency of logging updates (stored for downstream use; not actively enforced in the current implementation). DEFAULT: `1`

ATTRIBUTE	DESCRIPTION
`state`	Current CRN state (a clone of `crn_template`, then mutated).
`num_added_reactions`	Number of actions applied since last reset.
`actions_taken`	List of environment actions applied via `step`.
`raw_actions_taken`	Optional list of raw policy actions (if provided to `step`).

`clone()`

Create a deep copy of the environment.

The clone includes

a clone of the template and current state,
the current reaction count,
copies of actions_taken and raw_actions_taken.

RETURNS	DESCRIPTION
	A new `Environment` instance with duplicated internal state.

`reset()`

Reset the environment state to the template.

This clears the reaction counter and stored action histories.

RETURNS	DESCRIPTION
	The reset CRN state (clone of the template).

`step(action, stepper, raw_action=None)`

Apply an action to the CRN state via a stepper.

The stepper is responsible for mutating the current state given the action. After applying the action, the environment increments num_added_reactions and returns a termination flag indicating whether the reaction budget is exhausted.

PARAMETER	DESCRIPTION
`action`	Environment action to apply (typically a reaction or a reaction-like object).
`stepper`	Object providing `step(state, action)` that mutates the state in-place.
`raw_action`	Optional raw policy action (stored in `raw_actions_taken` if provided). Useful for algorithms that require access to the policy outputs (e.g., self-imitation learning). DEFAULT: `None`

RETURNS	DESCRIPTION
	Tuple `(state, done)` where: - `state` is the updated CRN state. - `done` is True if `num_added_reactions >= max_added_reactions`.

`get_action(index)`

Return the environment action taken at a given step index.

PARAMETER	DESCRIPTION
`index`	Index into `actions_taken`.

RETURNS	DESCRIPTION
	The action stored at the specified index.

RAISES	DESCRIPTION
`IndexError`	If `index` is out of range.

`get_raw_action(index)`

Return the raw policy action stored at a given step index.

PARAMETER	DESCRIPTION
`index`	Index into `raw_actions_taken`.

RETURNS	DESCRIPTION
	The raw action stored at the specified index.

RAISES	DESCRIPTION
`IndexError`	If `index` is out of range.

`get_reward(routine)`

Compute a reward (or loss) for the current CRN state.

The routine is expected to evaluate the current state and return a structure whose first element is a tuple (reward, last_task_info). This method extracts the scalar reward component and returns it.

PARAMETER	DESCRIPTION
`routine`	Callable taking the current state and returning an iterable whose first element is `(reward, last_task_info)`.

RETURNS	DESCRIPTION
	Scalar reward value (as produced by `routine`).

Notes

The last_task_info returned by the routine is not propagated by this method. In most workflows it is expected to be stored inside self.state.last_task_info by the routine/state implementation.

`render(mode={'style': 'human', 'task': 'transients', 'format': 'figure'}, ID=None)`

Render or log diagnostics for the current CRN state.

Rendering behavior is controlled by a mode dictionary. Supported cases include interactive plotting ('human') and logging plots to a logger ('logger').

PARAMETER DESCRIPTION

mode

Dictionary describing rendering behavior. Typical keys:

style: 'human' or 'logger'.
task: Diagnostic task. Supported values in this implementation include 'transients', 'phase_plot', 'rank', 'transients + dose-response', 'transients + frequency content', 'transients + logic', 'SSA_transients'.
format: 'figure' to log matplotlib figures directly, or 'image' to log PNG buffers. Some tasks also consume optional keys such as t0, bounds_freq, or scale.

DEFAULT: {'style': 'human', 'task': 'transients', 'format': 'figure'}

ID

Optional identifier used when naming logged artifacts.

DEFAULT: None

RETURNS	DESCRIPTION
	None.

Side Effects

In 'human' mode, opens matplotlib windows (depending on backend).
In 'logger' mode, logs text/figures/images via self.logger.
Creates and closes matplotlib figures.