Skip to content

deterministic

RL4CRN.rewards.deterministic

RL4CRN.rewards

Reward / cost functions for evaluating IOCRN behaviors under different design tasks.

This module provides task-specific wrappers around an IOCRN's simulation interface (e.g., IOCRN.transient_response and IOCRN.transient_response_piecewise) and converts the resulting trajectories into scalar performance measures that can be used as RL rewards (or costs).

Included objectives:

  • Dynamic tracking (continuous-valued): weighted L1/L2 tracking error to a reference trajectory or setpoint across multiple scenarios.
  • Piecewise tracking: same as above, but with piecewise-constant inputs and segmented time horizons (useful for protocols / sequences).
  • Oscillation shaping: penalizes deviations from desired oscillatory features such as frequency, mean level, damping, and peak ratios using oscillation_metrics.
  • Logic circuit scoring: evaluates steady-state binary behavior (via BCE or thresholded mismatch) for combinational circuits and piecewise logic protocols (e.g., latches).
  • Custom relationship tracking: evaluates arbitrary algebraic constraints between species trajectories defined by a user-supplied function (targeting zero error).

All functions return a scalar performance (interpretable as cost unless you negate it) and update crn.last_task_info with metadata such as the reward value, task type, and the simulation settings that produced it.

dynamic_tracking_error(crn, u_list, x0_list, time_horizon, r_list, w, norm=1, relative=False, LARGE_NUMBER=10000.0)

Compute a dynamic tracking cost for an IOCRN over a batch of scenarios.

The function simulates the CRN for each scenario in the Cartesian product of u_list and x0_list (as implemented by crn.transient_response) and evaluates how well the output trajectories track the provided references.

PARAMETER DESCRIPTION
crn

IOCRN An IOCRN-like object implementing transient_response(u_list, x0_list, time_horizon, LARGE_NUMBER=...). The simulation is expected to return (t, x_list, y_list, last_task_info), where y_list is a list of output trajectories.

u_list

list[np.ndarray] List of constant input vectors. Each element has shape (p,), where p is the number of CRN inputs.

x0_list

list[np.ndarray] List of initial state vectors. Each element has shape (n,), where n is the number of CRN species.

time_horizon

np.ndarray 1D array of evaluation times with shape (T,).

r_list

list[np.ndarray] List of reference signals/targets for each scenario. The expected shape and interpretation depend on performance_metric. Common usage is a list of arrays with shape (q, T) or (q,) (setpoints), where q is the number of outputs.

w

np.ndarray Weights for the tracking error. Typically shape (q, T) so each output and time point can be weighted differently. (Exact expectations follow performance_metric.)

norm

int, default=1 Norm used in the tracking error. Passed to performance_metric. Common options are 1 (L1) or 2 (L2 / squared error).

relative

bool, default=False If True, compute a relative error (as supported by performance_metric).

LARGE_NUMBER

float, default=1e4 Divergence penalty passed to transient_response. If the integrator fails or becomes unstable, trajectories may be filled with LARGE_NUMBER, which typically yields a large cost.

RETURNS DESCRIPTION
performance

float Scalar tracking cost aggregated across scenarios, outputs, and time.

last_task_info

dict Updated crn.last_task_info dictionary, augmented with:

  • 'reward': performance
  • 'setpoint': r_list
  • 'initial_conditions': x0_list
  • 'reward type': 'dynamic_tracking_error'

habituation_error_piecewise(crn, u_nested_list, x0_list, nested_time_horizon, w, LARGE_NUMBER=10000.0, min_peak=0.1, max_peak=2.0)

Compute a habituation cost for a piecewise protocol using peak ratios.

This function simulates the CRN under a piecewise-constant input protocol and evaluates "habituation" as a change in peak response across repeated stimulus windows.

Convention used here
  • nested_time_horizon defines K segments with time grids t_0, t_1, ..., t_{K-1}.
  • We treat even-indexed segments (0, 2, 4, ...) as "stimulus" windows. Peaks are measured in those windows for each scenario output trajectory.
  • A habituation score is computed from ratios of consecutive stimulus peaks: ratio_k = peak_{k+1} / peak_k (Lower ratios indicate stronger habituation.)

The function also enforces peak bounds. If any measured peak is outside [min_peak, max_peak], it returns LARGE_NUMBER.

PARAMETER DESCRIPTION
crn

IOCRN IOCRN-like object implementing: transient_response_piecewise(u_nested_list, x0_list, nested_time_horizon, LARGE_NUMBER=...).

u_nested_list

list[list[np.ndarray]] List of input protocols. Each protocol is a list of input vectors (p,) applied segment-wise. The inner list length must match len(nested_time_horizon).

x0_list

list[np.ndarray] List of initial conditions (n,).

nested_time_horizon

list[np.ndarray] List of time grids, one per segment. These will be stitched by the simulator.

w

Union[float, Sequence[float], np.ndarray] Weights applied to each peak ratio term. If a scalar, the same weight is applied to all ratios. If a sequence, it should have length equal to the number of ratios (n_peaks - 1).

LARGE_NUMBER

float, default=1e4 Penalty returned if simulation diverges or peak constraints fail.

TYPE: float DEFAULT: 10000.0

min_peak

float, default=0.1 Minimum acceptable peak value. Peaks below this are considered invalid.

TYPE: float DEFAULT: 0.1

max_peak

float, default=2.0 Maximum acceptable peak value. Peaks above this are considered invalid.

TYPE: float DEFAULT: 2.0

RETURNS DESCRIPTION

Tuple[float, dict]: - performance: Scalar habituation cost (lower is better). - last_task_info: crn.last_task_info updated with metadata: - 'reward': performance - 'min_peak', 'max_peak' - 'initial_conditions' - 'reward type': 'habituation_error_piecewise'

check_off_ss_invariant(crn, t, x, off_intervals, u_off, ss_tol_abs=0.0001, ss_tol_rel=0.001, xtol=1e-09, maxfev=2000)

x: full state trajectory, shape (n, T) off_intervals: list of (start,end) absolute u_off: input vector for OFF

habituation_metric(intervals, t, y_list, w, min_peak=0.1, max_peak=2.0, base_valley=0.0, LARGE_NUMBER=10000.0)

Compute a habituation cost from output peaks across repeated stimulus windows.

This metric extracts peak amplitudes from specified time intervals and computes ratios of consecutive stimulus peaks. By default, it assumes even-indexed intervals (0, 2, 4, ...) correspond to repeated stimulus windows.

Steps

1) For each scenario output trajectory in y_list, compute peaks in stimulus windows (even intervals). 2) Enforce peak bounds: if any peak is outside [min_peak, max_peak], return LARGE_NUMBER. 3) Compute peak ratios: ratio_k = peak_{k+1} / peak_k. 4) Return weighted mean of ratios (lower implies stronger habituation).

Notes
  • If there are fewer than 2 stimulus windows, this metric cannot form a ratio and returns LARGE_NUMBER.
  • If any peak is zero or extremely small, division can blow up; we guard with a small epsilon.
PARAMETER DESCRIPTION
intervals

Sequence[Tuple[float, float]] Time intervals (start, end) defining protocol segments.

TYPE: Sequence[Tuple[float, float]]

t

np.ndarray Stitched time vector from the simulator, shape (T,).

TYPE: ndarray

y_list

Sequence[np.ndarray] List of output trajectories, one per scenario. Each element is typically shape (q, T) (q outputs).

TYPE: Sequence[ndarray]

w

Union[float, Sequence[float], np.ndarray] Weights for each ratio term. If scalar, repeated. If sequence, must match number of ratios (n_peaks - 1) or will be broadcast/clipped.

TYPE: Union[float, Sequence[float], ndarray]

min_peak

float, default=0.1 Minimum acceptable peak amplitude.

TYPE: float DEFAULT: 0.1

max_peak

float, default=2.0 Maximum acceptable peak amplitude.

TYPE: float DEFAULT: 2.0

LARGE_NUMBER

float, default=1e4 Penalty returned if constraints fail or insufficient windows exist.

TYPE: float DEFAULT: 10000.0

RETURNS DESCRIPTION
float

Scalar habituation cost. Lower is better.

TYPE: float

dynamic_tracking_error_piecewise(crn, u_nested_list, x0_list, nested_time_horizon, r_list, w, norm=1, relative=False, LARGE_NUMBER=10000.0)

Compute a dynamic tracking cost for piecewise-constant input protocols.

This is the piecewise analogue of dynamic_tracking_error. Instead of constant inputs over a single horizon, each scenario specifies a sequence of inputs applied over segmented time horizons.

PARAMETER DESCRIPTION
crn

IOCRN An IOCRN-like object implementing transient_response_piecewise(u_nested_list, x0_list, nested_time_horizon, LARGE_NUMBER=...).

u_nested_list

list[list[np.ndarray]] List of input protocols. Each element is a sequence [u_0, u_1, ..., u_K], where each u_k has shape (p,). The inner list length must match len(nested_time_horizon).

x0_list

list[np.ndarray] List of initial state vectors, each of shape (n,).

nested_time_horizon

list[np.ndarray] List of time grids [t_0, t_1, ..., t_K], one per protocol segment. Each t_k is a 1D array of times for that segment. The CRN simulator is responsible for stitching the full trajectory.

r_list

list[np.ndarray] Reference signals/targets for each scenario (see performance_metric).

w

np.ndarray Weights for the tracking error (typically shape (q, T_full)), where T_full matches the concatenated time grid used in the simulation.

norm

int, default=1 Norm used in the tracking error (passed to performance_metric).

relative

bool, default=False If True, compute a relative error (as supported by performance_metric).

LARGE_NUMBER

float, default=1e4 Divergence penalty passed to the simulator.

RETURNS DESCRIPTION
performance

float Scalar tracking cost.

last_task_info

dict Updated crn.last_task_info with reward metadata (same keys as dynamic_tracking_error).

oscillation_error(crn, u_list, x0_list, time_horizon, f_list=None, mean_list=None, w=[1 / 4, 1 / 4, 1 / 4, 1 / 4], t0=0, LARGE_NUMBER=10000.0)

Compute an oscillation-shaping cost based on output time-series metrics.

The CRN is simulated (as in transient_response), then oscillatory features are extracted via oscillation_metrics. The returned scalar cost is a weighted sum of several error components:

  • mean error (if mean_list is provided)
  • frequency error (if f_list is provided)
  • damping deviation from 1
  • peak ratio r1 deviation from 1
PARAMETER DESCRIPTION
crn

IOCRN IOCRN-like object implementing transient_response.

u_list

list[np.ndarray] List of constant input vectors, each of shape (p,).

x0_list

list[np.ndarray] List of initial states, each of shape (n,).

time_horizon

np.ndarray 1D array of evaluation times with shape (T,).

f_list

list[np.ndarray] or None, default=None Desired oscillation frequencies per scenario and output. Expected format follows oscillation_metrics. If None, frequency error is not included.

mean_list

list[np.ndarray] or None, default=None Desired mean values per scenario and output (format follows oscillation_metrics). If None, mean error is not included.

w

list[float], default=[1/4, 1/4, 1/4, 1/4] Weights [mean_error, frequency_error, damping_error, r1_error].

t0

float, default=0 Time threshold after which oscillation metrics are evaluated (to ignore transients).

LARGE_NUMBER

float, default=1e4 Divergence penalty passed to the simulator.

RETURNS DESCRIPTION
performance

float Scalar oscillation cost.

last_task_info

dict Updated crn.last_task_info, augmented with:

  • 'reward': performance
  • 'frequency': f_list
  • 'reward type': 'oscillation_error'

logic_circuit_reward(crn, u_list, x0_list, time_horizon, r_list, w, norm=1, relative=False, LARGE_NUMBER=10000.0)

Compute a steady-state logic circuit cost using binary cross-entropy (BCE).

The CRN is simulated for each scenario. For each output trace, the final time point is treated as the steady-state output y_ss and compared against the target logic value r using BCE:

\[\text{BCE}(r, y_ss) = - [ r \log(y_ss) + (1-r) \log(1-y_ss) ].\]
Notes
  • This function currently ignores w, norm, and relative (kept for API compatibility with tracking rewards).
  • Outputs are clipped to [1e-6, 1-1e-6] to avoid log(0).
PARAMETER DESCRIPTION
crn

IOCRN IOCRN-like object implementing transient_response.

u_list

list[np.ndarray] List of constant inputs, each shape (p,).

x0_list

list[np.ndarray] List of initial states, each shape (n,).

time_horizon

np.ndarray 1D array of evaluation times with shape (T,).

r_list

list[np.ndarray] List of desired binary targets per scenario. Each r is expected to have shape (q,) (one target per output).

w

np.ndarray Unused (present for signature compatibility).

norm

int Unused.

relative

bool Unused.

LARGE_NUMBER

float, default=1e4 Divergence penalty passed to the simulator.

RETURNS DESCRIPTION
performance

float Mean BCE across scenarios and outputs (lower is better).

last_task_info

dict Updated crn.last_task_info, augmented with:

  • 'reward': performance
  • 'setpoint': r_list
  • 'initial_conditions': x0_list
  • 'reward type': 'dynamic_tracking_error' (kept as-is; consider renaming to 'logic_circuit_reward')

dynamic_tracking_error_piecewise_logic(crn, u_nested_list, x0_list, nested_time_horizon, r_list, w, norm=1, relative=False, LARGE_NUMBER=10000.0)

Compute a piecewise logic tracking cost using thresholded mismatch.

This is intended for sequential / protocol-driven logic tasks (e.g. latches), where targets are specified as binary values and outputs are evaluated using a 0.5 threshold across the entire time horizon (not only at steady state).

Internally, the CRN is simulated with transient_response_piecewise, then the score is computed by performance_metric_logic: \(\text{mean}(|1[r>0.5] - 1[y>0.5]|)\) over scenarios, outputs, and time.

PARAMETER DESCRIPTION
crn

IOCRN IOCRN-like object implementing transient_response_piecewise.

u_nested_list

list[list[np.ndarray]] List of input protocols (see dynamic_tracking_error_piecewise).

x0_list

list[np.ndarray] Initial states, each shape (n,).

nested_time_horizon

list[np.ndarray] List of time grids per segment.

r_list

list[np.ndarray] Target logic values per scenario, each expected shape (q,) or broadcastable to outputs (used as constant targets across time by performance_metric_logic).

w

np.ndarray Unused (present for signature compatibility).

norm

int Unused.

relative

bool Unused.

LARGE_NUMBER

float, default=1e4 Divergence penalty passed to the simulator.

RETURNS DESCRIPTION
performance

float Mean thresholded mismatch across scenarios, outputs, and time.

last_task_info

dict Updated crn.last_task_info with reward metadata.

performance_metric_logic(r_list, y_list)

Compute a binary (thresholded) mismatch score between targets and outputs.

Targets r_list are treated as desired binary outputs (thresholded at 0.5). Outputs y_list are thresholded at 0.5 across all time points. The returned score is the mean absolute mismatch: \(\text{mean}(|1[r>0.5] - 1[y>0.5]|)\) averaged over scenarios, outputs, and time.

PARAMETER DESCRIPTION
r_list

list[np.ndarray] List of reference logic targets, typically each of shape (q,) where q is the number of outputs.

y_list

list[np.ndarray] List of output trajectories, each of shape (q, T).

RETURNS DESCRIPTION

float Mean mismatch rate in [0, 1], where 0 indicates perfect logic behavior.

track_relationship(crn, u_list, x0_list, time_horizon, w, species_names, relationship_func, norm=1, LARGE_NUMBER=10000.0)

Compute a cost for enforcing an algebraic relationship between species trajectories.

This utility is for tasks where the objective is not tracking a pre-specified reference trajectory, but rather satisfying a constraint among species, e.g.

\[A(t) - B(t) = 0,\]
\[A(t) + B(t) - C(t) = 0,\]
\[A(t)B(t) - C(t) = 0,\]

etc.

The user supplies relationship_func, which is called on the requested species trajectories and should return an error signal that is zero when the desired relationship holds. The function then aggregates the error into a scalar cost using a weighted L1 or L2 norm across time (and across scenarios).

PARAMETER DESCRIPTION
crn

IOCRN IOCRN-like object implementing transient_response.

u_list

list[np.ndarray] List of constant input vectors, each shape (p,).

x0_list

list[np.ndarray] List of initial conditions, each shape (n,).

time_horizon

np.ndarray 1D time grid of shape (T,).

w

np.ndarray Weight array for the relationship error over time. Typically shape (q_rel, T) where q_rel is the output dimension of relationship_func (often 1). If w is 1D (T,), it is treated as (1, T).

species_names

list[str] Names of species to feed into relationship_func, in the same order as the function arguments.

relationship_func

callable Function mapping species trajectories to an error signal. It will be called as: relationship_func(traj_1, traj_2, ..., traj_N) where each traj_i has shape (T,). The function should return either:

  • a 1D array (T,) (interpreted as a single error channel), or
  • a 2D array (q_rel, T) for multiple error channels.

norm

int, default=1 Norm for aggregation:

  1. mean weighted absolute error
  2. mean weighted squared error

LARGE_NUMBER

float, default=1e4 Divergence penalty passed to transient_response. If trajectories contain values near LARGE_NUMBER, the relationship error is set to LARGE_NUMBER to strongly penalize divergence.

RETURNS DESCRIPTION
performance

float Scalar relationship-tracking cost.

last_task_info

dict Updated crn.last_task_info, augmented with:

  • 'reward': performance
  • 'initial_conditions': x0_list
  • 'reward type': 'relationship_tracking'
  • 'tracked_species': species_names

habituation_metric_with_gap(*, intervals, t, y_list, w, n_repeats_pre, n_repeats_post, gap_weight=5.0, recovery_tol=0.05, dishabituate_rho=1.0, min_peak=0.1, max_peak=2.0, LARGE_NUMBER=10000.0, sensitization=False)

Returns: habituation loss + gap consistency penalty. Keeps your "log(max ratio)" style for habituation.