Neuromodulation + Eligibility Traces (Three-Factor Learning)

This skill should be used when the user asks to "implement eligibility traces", "add three-factor learning", "implement neuromodulation", "add DA/ACh/NE/5-HT modulators", "implement STDP eligibility", "add synaptic plasticity", "implement online plasticity", "add reward-modulated learning", "implement trace dynamics", "add Dutch traces", "implement accumulating traces", "add replacing traces", "implement neuromodulatory gate", "add plasticity gain", "implement delayed reward association", "add eligibility decay", "implement spike-based traces", "add rate-based traces", "implement three-factor weight update", "add fast memory adapter", "implement bioplausible learning", or mentions eligibility traces, three-factor learning rules, neuromodulatory signals, STDP-based eligibility, reward-modulated plasticity, DA/ACh/NE/5-HT computation, or online synaptic updates in the cognitive pipeline.

sovr610 0 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add sovr610/refffiy/neuromodulation-eligibility-traces-three-factor-learning

Install via the SkillsCat registry.

SKILL.md

Neuromodulation + Eligibility Traces (Three-Factor Learning)

Purpose

This skill standardizes the "biologically inspired online plasticity" path: eligibility traces
accumulate local pre/post correlations (spike-based or rate-based), then a delayed third-factor
neuromodulatory signal gates when and how weights change. This is the canonical three-factor
learning rule where synaptic changes require a modulatory signal beyond just pre/post activity.
The non-negotiable goals are correct gating semantics (zero modulator = zero update) and
deterministic trace dynamics.

Key Files

Target Module	Template Asset	Purpose
`brain_ai/meta/eligibility_traces.py`	`assets/eligibility_traces_template.py`	EligibilityTraceModule: trace dynamics, STDP kernels, decay, reset/carry
`brain_ai/meta/neuromodulation.py`	`assets/neuromodulation_template.py`	NeuromodulatoryGate: DA/ACh/NE/5-HT computation, plasticity gain
`brain_ai/meta/three_factor_update.py`	`assets/three_factor_update_template.py`	ThreeFactorUpdate: weight update rules, online/hybrid integration
`brain_ai/meta/plasticity_diagnostics.py`	`assets/plasticity_diagnostics_template.py`	PlasticityTrace: logging, modulator stats, trace inspection
`brain_ai/config.py` (extend)	`assets/neuromod_config_template.py`	EligibilityConfig, NeuromodConfig, ThreeFactorConfig, PlasticityConfig

Public Contract

# EligibilityTraceModule
reset(batch_size, device) -> None
update(pre, post, dt=None) -> e  # eligibility tensor
apply_update(weights, mod_signal, *, lr, clamp) -> updated_weights

# NeuromodulatoryGate
forward(signals, state=None) -> modulators, global_plasticity
update_state(reward, novelty, urgency, patience) -> state

pre and post are activations or spike trains (B, N_pre) / (B, N_post). e is the
eligibility trace (B, N_post, N_pre) or (B, N) for diagonal variants. mod_signal is a
scalar or (B,) third-factor signal gating the update.

Core Output Contract

Field	Shape / Type	Description
`e`	`(B, N_post, N_pre)`	Eligibility trace matrix
`modulators`	`Dict[str, Tensor]`	`{DA, ACh, NE, 5HT}` each `(B,)` or `(B,1)`
`global_plasticity`	`(B,)`	Combined scalar plasticity gain
`delta_w`	same as weights	Weight update tensor
`trace_log`	`Optional[PlasticityTrace]`	Per-step diagnostics when `return_details=True`

Hard invariants:

If third-factor modulator is zero, delta_w is exactly zero regardless of pre/post activity.
Eligibility traces have no cross-batch leakage (batch dimension is independent).
reset() clears traces to zero; carry mode preserves traces across calls.
All trace computations run in fp32 for numerical stability under AMP.

Eligibility Trace Dynamics

Three trace types with configurable STDP kernel:

Trace Type	Update Rule	Use Case
Accumulating	`e += f(pre, post)`	Standard eligibility accumulation
Replacing	`e = max(e, f(pre, post))`	Event-driven, prevents unbounded growth
Dutch	`e = (1 - α)*e + f(pre, post)`	Hybrid decay + event, balanced dynamics

All types share the decay step: e(t+1) = (1 - dt/tau_e) * e(t) + trace_update(t)

The pre/post correlation function f supports:

Spike-based: pair-based STDP with timing-dependent kernel f(Δt)
Rate-based: f(pre, post) = pre ⊗ post (outer product or element-wise)

See references/eligibility-dynamics.md for trace mathematics, STDP kernels, decay analysis, and
rate vs spike implementation details.

Neuromodulator Computation

Four modulators computed deterministically from observable signals:

Modulator	Input Signal	Biological Analog	Output Range
DA (Dopamine)	Reward / TD error proxy	Reward prediction error	[-1, 1]
ACh (Acetylcholine)	Novelty / uncertainty / entropy	Attention / learning gate	[0, 1]
NE (Norepinephrine)	Urgency / surprise magnitude	Arousal / exploration	[0, 1]
5-HT (Serotonin)	Patience / long-horizon value	Discounting / exploitation	[0, 1]

Global plasticity gain: g = f(DA, ACh, NE, 5HT) — configurable combination function
(weighted sum, gated product, or learned MLP).

See references/neuromodulator-signals.md for signal mapping, computation details, and bounded
output guarantees.

Weight Update Integration

Two integration modes:

Mode	Mechanism	Use Case
Online plasticity	`Δw = lr * mod_signal * e`; applied directly to designated layers	Bioplausible fast adaptation, streaming inference
Hybrid training	Three-factor update as auxiliary loss/regularizer alongside backprop	Few-shot improvement, nonstationary streams

Online mode targets "fast memory" adapters or designated SNN synapses. Hybrid mode computes
eligibility as an auxiliary signal while the rest of the model trains normally via backprop.

See references/three-factor-rules.md for update rule mathematics, clamping, and integration patterns.

Integration Points

Module	Integration	Purpose
SNN core (`core/`)	Spike-based traces on SNN synapses	Bioplausible SNN plasticity
Workspace (`workspace/`)	ACh-gated attention modulation	Novelty-driven workspace competition
Reasoning (`reasoning/`)	DA-gated confidence updates	Reward-modulated System 2 refinement
Meta (`meta/`)	Eligibility-augmented inner loop	Online adaptation complement to MAML

See references/integration-hooks.md for per-module integration patterns and adapter design.

Configuration Surface

EligibilityConfig

Field	Default	Purpose
`trace_type`	`"accumulating"`	`"accumulating"`, `"replacing"`, `"dutch"`
`tau_e`	20.0	Eligibility decay time constant (ms or steps)
`kernel`	`"rate"`	`"rate"`, `"stdp_pair"`, `"stdp_symmetric"`
`stdp_tau_plus`	20.0	STDP potentiation time constant
`stdp_tau_minus`	20.0	STDP depression time constant
`dutch_alpha`	0.1	Dutch trace replacement rate
`clamp_range`	`[-5.0, 5.0]`	Trace value clamp bounds

NeuromodConfig

Field	Default	Purpose
`da_source`	`"reward"`	DA input signal mapping
`ach_source`	`"novelty"`	ACh input signal mapping
`ne_source`	`"urgency"`	NE input signal mapping
`sht_source`	`"patience"`	5-HT input signal mapping
`combination_fn`	`"weighted_sum"`	`"weighted_sum"`, `"gated_product"`, `"mlp"`
`modulator_hidden_dim`	64	Hidden dim for MLP combination

ThreeFactorConfig

Field	Default	Purpose
`mode`	`"online"`	`"online"`, `"hybrid"`, `"auxiliary_loss"`
`lr`	0.001	Plasticity learning rate
`weight_clamp`	`[-1.0, 1.0]`	Weight update clamp range
`target_layers`	`"all_eligible"`	Which layers receive three-factor updates
`update_frequency`	1	Steps between weight updates

Presets: PlasticityFullConfig.minimal(), .dev(), .production().

Done-When Gates

Gate	Test	Threshold
(a) Third-factor gating	Set mod_signal=0; assert delta_w == 0 with strong pre/post activity; set mod_signal != 0; assert delta_w != 0	Exact zero / non-zero
(b) Deterministic traces	Fixed pre/post sequences + dt; assert identical e(t) across 10 runs	Exact match
(c) Delayed reward association	Toy task: reward arrives after delay; eligibility traces enable learning despite gap; loss decreases	Loss decreases, accuracy improves

Common Failure Modes

Symptom	Cause	Fix
Updates occur with zero modulator	Gating not applied correctly	Assert `delta_w = mod * e` with explicit zero check
Traces explode	No decay or clamp	Verify tau_e > 0, enable clamp_range
Cross-batch leakage	Shared trace state	Ensure batch dim is independent, reset between episodes
NaN under AMP	fp16 trace accumulation	Force fp32 for all trace/modulator computation
DA always saturated	Reward signal not normalized	Normalize reward to [-1,1] before DA computation
ACh always zero	Novelty source not connected	Verify HTM anomaly or entropy signal is flowing
No learning despite high eligibility	Modulator timing mismatch	Check delay between activity and reward signal arrival
Traces identical for all inputs	Kernel function collapsed	Verify pre/post differ across inputs, check kernel

Anti-Patterns

Skipping the third factor -- the whole point is gated plasticity; without it, this is just Hebbian
fp16 trace computation -- eligibility accumulation needs fp32 precision
Shared traces across batch items -- each batch item has independent trace state
Unbounded trace accumulation -- always apply decay and/or clamp
Hardcoded modulator weights -- use NeuromodConfig, not magic numbers
Online updates on all layers -- designate specific "eligible" layers via config
No reset between episodes -- eligibility from previous episode contaminates current
Testing without delayed reward -- the delayed association test is the core validation

Additional Resources

Reference Files

references/eligibility-dynamics.md -- Trace types, STDP kernels, decay mathematics, rate vs spike, carry/reset semantics
references/neuromodulator-signals.md -- DA/ACh/NE/5-HT computation, signal mapping, bounded outputs, combination functions
references/three-factor-rules.md -- Weight update rules, online vs hybrid mode, clamping, convergence properties
references/integration-hooks.md -- SNN/workspace/reasoning integration, fast memory adapters, per-module patterns
references/testing-matrix.md -- All test cases: gating, determinism, delayed reward, stability, integration

Asset Templates

assets/eligibility_traces_template.py -- EligibilityTraceModule, trace types, STDP kernels, decay, self-test
assets/neuromodulation_template.py -- NeuromodulatoryGate, modulator computation, plasticity gain, self-test
assets/three_factor_update_template.py -- ThreeFactorUpdate, weight update rules, online/hybrid, self-test
assets/plasticity_diagnostics_template.py -- PlasticityTrace, logging, modulator stats, inspection, self-test
assets/neuromod_config_template.py -- All configs, presets, serialization, self-test

Scripts

scripts/validate_neuromod.py -- Runtime contract validation (third-factor gating, trace determinism, delayed reward)
scripts/gen_neuromod_tests.py -- Generates tests/test_neuromodulation.py (~80+ test cases)
scripts/plasticity_benchmark.py -- Benchmark trace update throughput, modulator computation, weight update speed

Neuromodulation + Eligibility Traces (Three-Factor Learning)

Resources

Install

Neuromodulation + Eligibility Traces (Three-Factor Learning)

Purpose

Key Files

Public Contract

Core Output Contract

Eligibility Trace Dynamics

Neuromodulator Computation

Weight Update Integration

Integration Points

Configuration Surface

EligibilityConfig

NeuromodConfig

ThreeFactorConfig

Done-When Gates

Common Failure Modes

Anti-Patterns

Additional Resources

Reference Files

Asset Templates

Scripts

Categories

Install

Recommended Skills