Skip to content

hall_of_fame

RL4CRN.utils.hall_of_fame

Hall-of-Fame utilities for reinforcement-learning over CRN environments.

This module implements a small, efficient Hall of Fame (HoF) container that keeps the best-performing environment snapshots seen so far during training. Items are ranked by a scalar objective (typically the latest task reward/loss stored in crn_env.state.last_task_info['reward']). The HoF supports:

  • Bounded capacity via a heap-backed structure (fast add/replace of the worst item).
  • Deduplication via a signature map (keeps only the best version per signature).
  • Fast random sampling for replay-style training.
  • Ranked iteration / indexing (best → worst) using a lazily rebuilt sorted cache.
Conventions
  • The HoF is designed to keep low-loss (or equivalently high-quality) entries.
  • Internally, entries store score = -loss so that higher score means better.
  • Environment snapshots are cloned on insertion to avoid later mutation.

HoFItem

Container for a single Hall-of-Fame entry.

Instances are ordered so they can be stored in a min-heap (heapq), where the heap root represents the worst entry currently kept (highest loss / lowest quality). Ties are broken by timestamp to ensure deterministic heap behavior when scores match.

PARAMETER DESCRIPTION
loss

Objective value to minimize (lower is better).

TYPE: float

signature

Hashable identifier for the environment structure/state. This implementation stores signature.tobytes() to use as a dict key.

timestamp

Time of insertion/update (e.g., time.time()).

TYPE: float

env

Snapshot of the environment to store (should be clone-safe).

__lt__(other)

Heap ordering: worst entries compare as "smaller".

We primarily compare by score (=-loss). Smaller score means worse. On ties, older timestamps are considered smaller.

assign(other)

In-place update of this entry's contents from another HoFItem.

This is used to refresh an existing signature with a better score without creating a new object (helps keep signature_map references valid).

HallOfFame

Fixed-capacity Hall-of-Fame for environment snapshots.

Maintains up to max_size unique entries keyed by a state/environment signature. When adding:

  • If the signature already exists, the entry is updated only if it is better.
  • If the HoF is full, the new entry replaces the current worst entry only if it is better.

The internal heap is optimized for fast worst-item access/replacement. Ranked access (best→worst) is provided via a lazily rebuilt sorted cache.

PARAMETER DESCRIPTION
max_size

Maximum number of entries to keep.

TYPE: int

add(crn_env)

Add a CRN environment snapshot to the Hall of Fame.

The entry's loss is read from crn_env.state.last_task_info['reward'] and the deduplication key is taken from crn_env.state.get_bool_signature().

Notes
  • The environment is cloned before storage to prevent later mutation.
  • If a matching signature already exists, it is updated in-place only if the new candidate is better (lower loss / higher score).
PARAMETER DESCRIPTION
crn_env

Environment instance expected to provide:

  • state.last_task_info['reward'] (float-like loss)
  • state.get_bool_signature() (array-like signature with .tobytes())
  • clone() (deep-ish copy used for snapshotting)

RAISES DESCRIPTION
ValueError

If the environment does not expose the expected reward field.

add_all(crn_envs)

Add a collection of environments to the Hall of Fame.

PARAMETER DESCRIPTION
crn_envs

Iterable of environments compatible with add().

TYPE: iterable

sample(batch_size)

Uniformly sample stored environments (unordered).

Sampling does not require sorting and is therefore fast.

PARAMETER DESCRIPTION
batch_size

Number of samples to draw.

TYPE: int

RETURNS DESCRIPTION
list

A list of sampled environment snapshots (length ≤ batch_size).

__iter__()

Iterate over stored environments ranked from best to worst.

YIELDS DESCRIPTION
env

Environment snapshots ordered by increasing loss (best first).

__getitem__(index)

Get the environment snapshot by rank.

PARAMETER DESCRIPTION
index

Rank index where 0 is best, 1 is second-best, etc.

TYPE: int

RETURNS DESCRIPTION
env

The environment snapshot at the requested rank.

__len__()

Return the number of entries currently stored in the Hall of Fame.