Active Inference Agent (Generative Model + EFE + Empowerment)

This skill should be used when the user asks to "implement active inference", "add EFE computation", "implement expected free energy", "add empowerment estimation", "implement generative model", "add latent state encoder", "implement transition model", "add preference model", "implement planning rollouts", "add CEM planner", "implement amortized policy", "add pymdp backend", "implement offline RL", "add Minari integration", "implement pragmatic value", "add epistemic value", "implement instrumental value", "add action selection", "implement belief updating", "add world model training", "implement POMDP planning", "add rollout engine", "implement latent imagination", "add horizon normalization", "implement cross-entropy method planning", "add preference learning", "implement variational empowerment", or mentions active inference, expected free energy decomposition, POMDP planning, empowerment estimation, latent imagination, or decision-as-inference in the cognitive pipeline.

sovr610 0 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add sovr610/refffiy/active-inference-agent-generative-model-efe-empowerment

Install via the SkillsCat registry.

SKILL.md

Active Inference Agent (Generative Model + EFE + Empowerment)

Purpose

This skill standardizes the "decision as inference" layer (Phase 5): given an observation
(or workspace state), infer a latent state, roll out candidate action sequences through a
learned generative model, score each policy by Expected Free Energy (EFE), and choose (or
amortize) the best action. Optionally swaps in a discrete POMDP backend via pymdp for
regression testing.

Key Files

Target Module	Template Asset	Purpose
`brain_ai/decision/active_inference.py`	`assets/active_inference_template.py`	Main agent: reset, infer_state, plan, act
`brain_ai/decision/generative_model.py`	`assets/generative_model_template.py`	LatentEncoder q(s
`brain_ai/decision/efe.py`	`assets/efe_template.py`	Pure EFE functions, 3-term decomposition, sum invariant
`brain_ai/decision/planners.py`	`assets/planners_template.py`	RolloutEngine, RandomShooting, CEM planner
`brain_ai/decision/amortized_policy.py`	`assets/amortized_policy_template.py`	Amortized policy distillation from planner
`brain_ai/decision/pymdp_backend.py`	`assets/pymdp_backend_template.py`	Optional discrete POMDP via pymdp
`brain_ai/config.py` (extend)	`assets/active_inference_config_template.py`	ActiveInferenceConfig, EFEConfig, PlannerConfig, etc.

Public Contract

reset(batch_size: int, device: torch.device) -> AgentState
infer_state(o_t, ctx=None, state=None) -> (q_params, s_sample, updated_state)
plan(o_t, ctx=None, state=None, learn=False) -> ActionOutput
act(o_t, ctx=None, state=None, learn=False) -> ActionOutput

Input o_t is (B, obs_dim) from workspace. Optional ctx carries workspace slots / WM state.

ActionOutput Contract

Field	Shape / Type	Description
`action`	`(B, action_dim)` or `(B,)`	Continuous action vector or discrete action ids
`efe_total`	`(B,)`	Total Expected Free Energy (sum of terms)
`efe_terms`	`Dict[str, Tensor]`	`pragmatic`, `epistemic`, `instrumental` (each `(B,)`)
`horizon`	`int`	Planning horizon H used
`num_rollouts`	`int`	Number of candidate sequences evaluated
`planner_type`	`str`	`"random_shooting"`, `"cem"`, or `"amortized"`
`seed`	`Optional[int]`	RNG seed for reproducibility
`debug`	`Optional[Dict]`	Trajectories summary, uncertainty stats, preference match

Hard invariant: |sum(efe_terms.values()) - efe_total| < 1e-5 for every batch element.

Generative Model Stack

Four learned components, each independently testable:

Component	Notation	Input	Output
Latent encoder	`q(s\|o)`	observation `o_t` (+ optional `ctx`)	posterior params `(mu, log_var)` or categorical logits
Likelihood	`P(o\|s)`	latent state `s`	predicted observation distribution params
Transition	`P(s'\|s,a)`	latent `s`, action `a`	next-state distribution params (ensemble for uncertainty)
Preferences	`C`	—	target observation distribution (fixed, learned, or reward-derived)

Sampling: reparameterization trick for continuous; Gumbel-softmax for differentiable discrete.
Transition stability: ensemble of models or mean+variance prediction with log-variance clamping.

See references/generative-model.md for detailed architecture and implementation.

EFE Decomposition

Three terms computed by pure functions (no side effects, unit-testable in isolation):

G(pi) = Sum_{t=1..H} [ pragmatic(t) + epistemic(t) + instrumental(t) ]

Term	Formula	Intuition
Pragmatic	`E_q(o_t\|pi)[ -log p_pref(o_t) ]`	Penalize outcomes violating preferences
Epistemic	`E[ KL( q(s_t\|o_t,pi) \|\| q(s_t\|pi) ) ]`	Reward uncertainty reduction
Instrumental	`I(A_{t:t+K}; S_{t+K})` (approx.)	Reward keeping options open (empowerment)

Instrumental term stored as negative empowerment (non-positive) in efe_terms["instrumental"], so all three terms sum directly: efe_total = w_p*pragmatic + w_e*epistemic + w_i*instrumental. Telemetry may log the absolute empowerment value separately.

See references/efe-decomposition.md for derivations, pure function signatures, and sum invariant testing.

Planning and Rollouts

Mode	Description	Use Case
Random shooting	Sample N action sequences, evaluate EFE, pick best	Baseline, debugging
CEM	Iterative: sample -> select elite -> refit distribution -> resample	Better quality, production
Amortized	Trained policy pi_theta(a\|o,ctx) approximating planner	Fast inference

Horizon stability: normalize EFE by horizon (mean per step) or use discount factor gamma.
Candidate pool size and preference/epistemic scale must remain consistent across horizon values.

See references/planning-rollouts.md for rollout engine, CEM iterations, and amortized distillation.

Optional pymdp Backend

When pymdp is installed, provide a discrete POMDP backend building A/B/C/D arrays and using
pymdp planning routines. Valuable for:

Regression tests (neural EFE matches discrete reference on toy problems)
Debugging decomposition correctness without neural approximation noise

See references/pymdp-integration.md for array construction, policy evaluation, and regression tests.

Offline RL via Minari

Phase 5 offline mode uses Minari for Gymnasium-aligned dataset loading:

Load dataset -> extract episodes (o_t, a_t, o_{t+1}, done, reward)
Train world model components via supervised sequence prediction in latent space
Define preferences (fixed from task goal or learned from reward)
Evaluate: offline planning on held-out trajectories

See references/offline-rl-minari.md for dataset loading, training loops, and evaluation protocols.

Configuration Surface

ActiveInferenceConfig

Field	Default	Purpose
`obs_dim`	4096	Workspace observation dimension
`state_dim`	256	Latent state dimension
`action_dim`	128	Action space size
`hidden_dim`	512	Hidden layer width
`planning_horizon`	8	Rollout depth H
`num_rollouts`	128	Candidate action sequences N

EFEConfig

Field	Default	Purpose
`pragmatic_weight`	1.0	Goal-directedness weight
`epistemic_weight`	1.0	Information gain weight
`instrumental_weight`	0.1	Empowerment weight
`num_samples`	32	Monte Carlo samples for EFE
`discount_factor`	0.99	Temporal discount gamma
`normalize_by_horizon`	True	Mean-per-step EFE normalization

PlannerConfig

Field	Default	Purpose
`planner_type`	`"cem"`	`"random_shooting"`, `"cem"`, `"amortized"`, `"mppi"`
`cem_iterations`	5	CEM refinement rounds
`cem_elite_fraction`	0.1	Top fraction for CEM refit
`cem_temperature`	1.0	Sampling temperature
`action_temperature`	1.0	Final action selection temperature

PreferenceConfig

Field	Default	Purpose
`mode`	`"learned"`	`"fixed"`, `"learned"`, `"reward_derived"`
`preference_dim`	256	Preference embedding dimension
`learn_preferences`	True	Allow gradient updates to preferences
`prior_strength`	0.1	KL regularization toward prior

Presets: ActiveInferenceFullConfig.minimal(), .dev(), .production_1b(), .production_3b(), .production_7b().

Done-When Gates

Gate	Test	Threshold
(a) EFE sum invariant	Feed synthetic distributions with closed-form EFE; assert `\|sum(terms) - total\| < 1e-5`	Exact match
(b) Horizon stability	Same seed, compare actions for H=3,5,7 with normalization; EFE per-step consistent	Action stability
(c) Offline RL end-to-end	Load Minari dataset, train world model, run planner, produce actions + EFE logs	No crashes, metrics JSON

Common Failure Modes

Symptom	Cause	Fix
EFE terms don't sum to total	Side effects in term computation	Use pure functions, test in isolation
Actions oscillate with horizon	Unnormalized EFE scaling	Enable `normalize_by_horizon`, fix discount
Empowerment collapses to 0	Source/planning networks collapsed	Add entropy bonus, check gradient flow
Transition model predicts mean	Ensemble/variance not used	Enable ensemble or stochastic output
Preferences dominate everything	Pragmatic weight too high	Balance weights, log all terms
CEM converges to local optima	Too few samples or iterations	Increase `num_rollouts`, `cem_iterations`
Amortized policy diverges from planner	Stale training data	Online distillation, periodic refresh
pymdp regression fails	Neural EFE scale mismatch	Normalize both before comparison
Minari dataset shape mismatch	Wrong environment wrapper	Check obs/action space alignment

Anti-Patterns

Non-pure EFE functions — EFE term computation must be side-effect free for testing
Summing EFE terms with different scales — normalize each term before weighting
Hardcoded preference distributions — always use Preferences module, even for fixed prefs
Skipping ensemble for transition model — single deterministic model hides uncertainty
Training amortized policy on stale data — distill from current planner, not cached actions
Ignoring log-variance clamping — transition model variance can diverge
Using fp16 for EFE computation — sum invariant needs fp32 precision
Branching on planner type outside planners module — use unified Planner.plan() interface
No discount or horizon normalization — EFE magnitude grows with H, causing instability

Additional Resources

Reference Files

references/generative-model.md — Full spec: latent encoder, likelihood, transition, preferences, sampling, stability
references/efe-decomposition.md — EFE derivation, pure functions, sum invariant, term normalization
references/planning-rollouts.md — Rollout engine, random shooting, CEM, amortized policy, horizon stability
references/pymdp-integration.md — Discrete POMDP backend, A/B/C/D arrays, regression testing
references/offline-rl-minari.md — Minari dataset loading, world model training, evaluation protocol
references/testing-matrix.md — All test cases: EFE invariant, horizon stability, offline RL, pymdp regression

Asset Templates

assets/active_inference_template.py — ActiveInferenceAgent: reset, infer_state, plan, act, self-test
assets/generative_model_template.py — LatentEncoder, LikelihoodDecoder, TransitionModel, Preferences, self-test
assets/efe_template.py — Pure EFE functions, pragmatic/epistemic/instrumental, sum invariant, self-test
assets/planners_template.py — RolloutEngine, RandomShootingPlanner, CEMPlanner, self-test
assets/amortized_policy_template.py — AmortizedPolicy, PlannerDistiller, online/offline training, self-test
assets/pymdp_backend_template.py — PyMDPBackend, array construction, EFE regression, self-test
assets/active_inference_config_template.py — All configs, presets, serialization, self-test

Scripts

scripts/validate_active_inference.py — Runtime contract validation (EFE sum, planner consistency, state management)
scripts/gen_active_inference_tests.py — Generates tests/test_active_inference.py (~80+ test cases)
scripts/toy_benchmark.py — Deterministic toy benchmark harness (MiniGrid online + Minari offline, <2 min CPU)

Active Inference Agent (Generative Model + EFE + Empowerment)

Resources

Install

Active Inference Agent (Generative Model + EFE + Empowerment)

Purpose

Key Files

Public Contract

ActionOutput Contract

Generative Model Stack

EFE Decomposition

Planning and Rollouts

Optional pymdp Backend

Offline RL via Minari

Configuration Surface

ActiveInferenceConfig

EFEConfig

PlannerConfig

PreferenceConfig

Done-When Gates

Common Failure Modes

Anti-Patterns

Additional Resources

Reference Files

Asset Templates

Scripts

Categories

Install

Recommended Skills