add_reaction_by_index
RL4CRN.policies.add_reaction_by_index
Neural-network policies for adding reactions to an IOCRN.
This module contains policy networks that map a tensorized IOCRN observation to a distribution over actions that extend the CRN by adding one reaction.
In the default “add reaction by index” formulation, an action is a dictionary:
reaction index(int):, which library reaction to add nextcontinuous parameters(list[float]): sampled continuous parameters (masked per reaction)discrete parameters(list[int]): | None,parameters(array-like): concatenation of continuous+discrete (if any)
The policy factorizes the joint action distribution into a structure term and (optional) parameter terms:
Log-probabilities and entropies returned by the policy correspond to this factorization:
Masking is used to: - forbid selecting reactions already present in the IOCRN (structure logits masked to -∞), - forbid sampling parameters that do not exist for the chosen reaction (dimension masks), - forbid invalid discrete-category combinations when using a flattened logit space (logit masks).
Temperature scaling can be applied to the structure logits to control exploration:
where T may be adapted online to target a desired entropy ratio.
AddReactionByIndex
Bases: Module
Policy network that samples one reaction addition for each element of a batch of IOCRNs.
The policy has an encoder + multiple “heads”:
- Encoder: maps the observation vector
stateto a learned embeddingh. - Structure head: produces logits over
Mlibrary reactions, then samples a reaction index. - Continuous parameter generator (optional): samples continuous parameters for the chosen reaction.
- Discrete parameter generator (optional): samples discrete parameters for the chosen reaction.
The action distribution factorizes as:
where:
- \(r\) is the reaction index (0..M-1),
- \(\theta_c\) are continuous parameters (e.g. LogNormal),
- \(\theta_d\) are discrete parameters (e.g. Categorical).
Notes: State layout (no input-influence observation): state ∈ R^{N×(M+K)}
- state[:, :M] : multi-hot “reactions present” indicator
- state[:, M:] : flattened parameter vector (0 where not present)
If allow_input_influence=True, the expected state layout is larger
(includes additional per-input parameter influence features). This path is
partially scaffolded but not implemented end-to-end in the current code.
Returns from forward:
- sampled action dictionaries (unless
actionis provided), - log-probabilities (per batch element),
- entropies (per batch element, weighted per head).
__init__(num_reactions, num_parameters, num_inputs, encoder_attributes, deep_layer_size, structure_head_attributes, parameter_head_attributes, input_influence_head_attributes, masks=None, zero_reaction_idx=None, stop_flag=False, continuous_distribution={'type': 'lognormal'}, discrete_distribution={'type': 'categorical', 'categories': torch.tensor([1, 2])}, entropy_weights_per_head=None, structure_head_temperature={'target_entropy_ratio_to_max': 1.0, 'initial_temperature': 1.0, 'rate': 0.0, 'current_temperature': 1.0}, allow_input_influence=False, device=None)
Initialize the AddReactionByIndex policy.
| PARAMETER | DESCRIPTION |
|---|---|
num_reactions
|
int Number of candidate reactions in the library (denoted M).
|
num_parameters
|
int Size of the flattened global parameter vector across the library (denoted K). This corresponds to the “explicit” parameterization used by observers/tensorizers.
|
num_inputs
|
int Number of IO inputs (denoted p). Only relevant when using input-influence features.
|
encoder_attributes
|
dict
Configuration for the encoder MLP (
|
deep_layer_size
|
int Dimensionality of the encoder output embedding h(s).
|
structure_head_attributes
|
dict
Configuration for the structure head MLP (
|
parameter_head_attributes
|
dict
Configuration for parameter generator backbones (
|
input_influence_head_attributes
|
dict Reserved for a future input-influence head (currently not implemented).
|
masks
|
dict or None Optional masks derived from the reaction library: - 'continuous': float mask of shape (M, max_num_continuous_params) - 'discrete' : float mask of shape (M, max_num_discrete_params) - 'logit' : bool mask of shape (M, total_num_discrete_combinations) These masks are used to ensure only existing parameters/logits are used for each reaction.
|
zero_reaction_idx
|
int or None If provided, the policy will be allowed to resample the “zero reaction” more than once.
|
stop_flag
|
bool If True, the policy will stop adding reactions when the “zero reaction” is selected.
|
continuous_distribution
|
dict
Continuous parameter distribution spec passed to ParameterGeneratorFromDistribution
(e.g. {"type": "lognormal", ...}). The policy sets
|
discrete_distribution
|
dict
Discrete parameter distribution spec (e.g. {"type": "categorical", "categories": ...}).
The policy sets
|
entropy_weights_per_head
|
dict or None Entropy weights for each head. Keys: {'structure','continuous','discrete','input_influence'}. Used to form a weighted entropy signal: H_total = Σ_i w_i H_i
|
structure_head_temperature
|
dict Temperature schedule state for the structure head. Expected keys: - target_entropy_ratio_to_max - initial_temperature - rate - current_temperature The logits are scaled as z/T before constructing the Categorical distribution.
|
allow_input_influence
|
bool If True, the observation and architecture include additional features/heads for input influence. (Currently not implemented.)
|
device
|
torch.device or None Device where parameters and tensors should live.
|
forward(state, mode='full', action=None, structure_temp=None)
Sample actions (or score provided actions) for a batch of IOCRN observations.
| PARAMETER | DESCRIPTION |
|---|---|
state
|
torch.Tensor
Batched observation tensor of shape (N, D).
For
|
mode
|
{"full", "partial"}
|
action
|
list[dict] or None If provided, the policy does not sample; it computes log π(action|state) for the given batch of actions (used e.g. in SIL replay / scoring). Each dict must include at least:
|
structure_temp
|
float or None
If provided, overrides the structure-head temperature
|
- If
action is None:actions: list[dict] Sampled actions, one per batch element.log_probabilities: torch.Tensor Log-probabilities per batch element, shape (N,). Computed as the sum of head log-probabilities: log π(a|s) = log π_struct + log π_cont + log π_disc (+ log π_input_influence)entropies: torch.Tensor Weighted entropy per batch element, shape (N,): H = w_s H_struct + w_c H_cont + w_d H_disc
- If
action is not None:log_probabilities: torch.Tensor Log-probabilities of the provided actions, shape (N,).
Implementation details
Structure sampling with masking Let z(s) be the structure logits (N×M). Reactions already present are masked: z_masked = z(s) with z_masked[r_present] = -∞ Then temperature scaling is applied: z_T = z_masked / T and a Categorical distribution is formed: r ~ Categorical(logits=z_T)
Adaptive temperature (training only) When sampling (action is None) in training mode, the current temperature is nudged based on the observed mean structure entropy relative to the maximum entropy log(M).
Parameter generation
Continuous and discrete parameters are generated conditionally using
ParameterGeneratorFromDistribution, and are masked so that nonexistent parameters
are zeroed out and/or omitted from the returned per-sample lists.