add_reaction_by_ordered_index
RL4CRN.policies.add_reaction_by_ordered_index
AddReactionByOrderedIndex
Bases: AddReactionByIndex
Extension of AddReactionByIndex that enforces an ordered reaction-selection scheme.
The base policy samples a reaction index from the library (excluding already-present reactions) and then samples its parameters. This subclass adds two extra structural constraints:
1) Template-aware ordering
At the first call in an episode/batch, the current IOCRN reaction multi-hot vector is
snapshotted as a template (template_mask). Only reactions added after this snapshot
are considered "added by the agent". Ordering constraints are applied only to these
added reactions, so template reactions do not affect the allowed index range.
2) Sequentiality constraint Once the agent has added at least one reaction, subsequent reactions must have an index strictly greater than the maximum index among the agent-added reactions so far. Concretely: r_next > max(added_indices) This is enforced with either:
- a soft penalty (finite
constraint_strength), or - a hard mask (
constraint_strength = inf), making violations impossible.
Additionally, an optional combinatorial bias term can be added to the structure logits to shape the policy toward a uniform distribution over unordered sets of a target size (rather than uniform over ordered action sequences).
Compared to the base class, the parameters heads/generators are unchanged; only the structure sampling logits are modified prior to constructing the categorical distribution.
__init__(num_reactions, num_parameters, num_inputs, encoder_attributes, deep_layer_size, structure_head_attributes, parameter_head_attributes, input_influence_head_attributes, target_set_size, masks=None, continuous_distribution={'type': 'lognormal'}, discrete_distribution={'type': 'categorical', 'categories': torch.tensor([1, 2])}, entropy_weights_per_head=None, structure_head_temperature={'target_entropy_ratio_to_max': 1.0, 'initial_temperature': 1.0, 'rate': 0.0, 'current_temperature': 1.0}, allow_input_influence=False, device=None, combinatorial_bias_enabled=True, constraint_strength=float('inf'))
Initialize the ordered-index reaction-addition policy.
All parameters from AddReactionByIndex are supported. Additional parameters:
| PARAMETER | DESCRIPTION |
|---|---|
target_set_size
|
int
Desired total number of reactions in the final CRN (including template reactions).
Used to compute the combinatorial prior so that, under an uninformative policy,
the probability of arriving at a particular final set is approximately uniform:
P(set) ∝ 1 / C(M, K)
where M is library size and K is
|
combinatorial_bias_enabled
|
bool, default=True If True, adds a combinatorial bias term to the structure logits that accounts for how many completions remain if a given index is chosen next.
|
constraint_strength
|
float, default=inf Strength of the ordering constraint. - If finite: applies a subtractive penalty to out-of-order logits (soft constraint). - If infinite: treats out-of-order choices as impossible (hard mask).
|
Internal state
-
template_mask(torch.Tensor or None): Snapshot of the initial reaction multi-hot vector for the current episode/batch. Shape (N, M). Set on the firstforwardcall afterreset_template(). -
library_indices(torch.Tensor): Float tensor [0, 1, ..., M-1] used to compute max indices efficiently.
reset_template()
Reset the internal template snapshot.
Call this at the start of a new episode (or whenever the “template CRN” changes) so that
the next call to forward captures the current reaction multi-hot vector as template_mask.
Why this matters
The ordering constraint is designed to apply only to reactions added by the agent.
Resetting the template ensures that pre-existing/template reactions do not influence
the computed max_added_index and therefore do not restrict future choices.
forward(state, mode='full', action=None, structure_temp=None)
Sample or score actions under ordered-index and combinatorial constraints.
This method mirrors AddReactionByIndex.forward but modifies the structure logits
before sampling/scoring the reaction index.
| PARAMETER | DESCRIPTION |
|---|---|
state
|
torch.Tensor Batched observation tensor (N, D). The first M entries must be the reaction multi-hot vector indicating reactions present in the current IOCRN.
|
mode
|
{"full", "partial"} - "full": sample structure + parameters (supported). - "partial": not implemented.
|
action
|
list[dict] or None If provided, the method computes log π(action|state) for the given actions instead of sampling. The action dictionaries must include a "reaction index" and parameter fields consistent with the configured generators (same as base class).
|
structure_temp
|
float or None Optional temperature override for the structure head logits.
|
| RETURNS | DESCRIPTION |
|---|---|
|
|
|
Ordering logic
-
Template snapshot (first call only) If
template_maskis not set, storestate[:, :M]as the template. -
Determine agent-added reactions
added_reactions_mask = (state[:,:M] - template_mask) > 0.5and compute:num_added_by_agent,total_existing_counts -
Sequentiality mask Let
max_added_indexbe the maximum library index among added reactions. If the agent has added at least one reaction, indices <= max_added_index are penalized or masked (depending onconstraint_strength). -
Combinatorial bias (optional) A bias term is added to each candidate reaction index i representing the log-count of ways to complete the remaining set after choosing i, accounting for template-fixed items. Invalid completions yield
-infand are hard-masked. -
Hard vs soft masks Hard mask:
- template reactions (cannot re-select fixed/template entries)
- impossible completions from combinatorial bias (
-inf) Soft mask: - out-of-order indices (sequentiality violations), optionally penalized
-
Emergency valve If all logits become
-inffor any batch element, the last index is set to 0 to avoid crashing the categorical distribution construction. -
Sampling / scoring Build a Categorical over masked logits (with temperature) and sample or evaluate the provided indices.
-
Parameter generation Delegates to the same continuous/discrete parameter generators as the base class.
Notes
- This class does not change the parameterization; it only constrains structure sampling.
- The “entropy correction” term when combinatorial bias is enabled modifies the structure entropy signal by adding E_p[bias], which corresponds to optimizing toward the biased prior (i.e., minimizing KL(p || exp(bias)) up to a constant).
log_combinations(n, k)
Compute the logarithm of the binomial coefficient, log C(n, k), in a numerically stable way.
This helper is used to build combinatorial priors/biases over remaining action choices.
It supports tensor-valued inputs and returns -inf for invalid pairs (k < 0 or k > n),
which is convenient when treating invalid combinations as impossible events.
| PARAMETER | DESCRIPTION |
|---|---|
n
|
torch.Tensor Number of items available (can be broadcasted).
|
k
|
torch.Tensor Number of items to choose (can be broadcasted).
|
| RETURNS | DESCRIPTION |
|---|---|
|
torch.Tensor
|
Notes
Uses the identity:
and clamps intermediate values to avoid NaNs when masking invalid inputs.