"End-to-end IsaacLab RL workflow in moleworks_ext: run with isaaclab.sh -p, local smoke tests, cluster submit/monitor/debug on Euler, sync results, benchmark pulled policies, and generate temporal training plots."
Resources
1Install
npx skillscat add idate96/codex-skills/rl-isaaclab Install via the SkillsCat registry.
RL + IsaacLab Workflow (moleworks_ext)
Single-skill workflow for IsaacLab-based RL: local smoke test → cluster submit → monitor/debug → sync → playback.
1) Always Run IsaacLab Scripts via Wrapper
IMPORTANT: Any script that touches Isaac Sim/Lab must use:
/workspace/isaaclab/isaaclab.sh -p <script.py> [args]Use this for:
- Isaac Sim/Lab imports
- USD/PhysX
- GPU/RTX
- training/testing scripts
pxr Import Errors
If you hit ImportError: No module named 'pxr', load the app launcher inside the file:
from isaacsim import SimulationApp
simulation_app = SimulationApp({"headless": True})
from pxr import Usd, UsdGeom2) Local Smoke Test (ALWAYS first)
Disable W&B for local smoke tests to avoid junk runs.
export WANDB_MODE=disabled
/workspace/isaaclab/isaaclab.sh -p scripts/rsl_rl/train.py \
--task <TASK> --num_envs 4 --max_iterations 3 --headless
unset WANDB_MODEExample (excavation3d_w_cabin):
export WANDB_MODE=disabled
/workspace/isaaclab/isaaclab.sh -p scripts/rsl_rl/train.py \
--task Moleworks-Isaac-m445-digging-3D-w-cabin --num_envs 4 --max_iterations 3 --headless
unset WANDB_MODE3) Submit to Euler Cluster
JOB_TIME=30m NUM_GPUS=2 GPU_TYPE=rtx_3090 ./docker/cluster/cluster_interface.sh job \
--task <TASK> --num_envs 64000 --max_iterations 10000Example (excavation3d_w_cabin, 24h, 1 GPU):
JOB_TIME=24h NUM_GPUS=1 GPU_TYPE=rtx_3090 ./docker/cluster/cluster_interface.sh job \
--task Moleworks-Isaac-m445-digging-3D-w-cabin --num_envs 64000 --max_iterations 10000Custom entrypoint (optional):
CLUSTER_EXECUTABLE=scripts/mole_environments/grading/train_grading.py \
JOB_TIME=2h NUM_GPUS=2 GPU_TYPE=rtx_3090 ./docker/cluster/cluster_interface.sh job \
--task Moleworks-Isaac-m445-grading --num_envs 64000 --max_iterations 10000Important semantics:
- In multi-GPU mode,
--num_envsis interpreted as total envs and split across GPUs (e.g.,32000withNUM_GPUS=2becomes16000/GPU). - With
--resume,--max_iterationsis treated as additional learning iterations beyond the loaded checkpoint iteration (not a strict total cap).
4) Monitor Jobs (Euler)
ssh euler 'squeue -u $USER'Job states:
- RUNNING: executing
- PENDING: waiting on resources
- FAILED/NODE_FAIL: crashed
5) Debug Failed Jobs
ssh euler 'find /cluster/scratch/$USER -name "slurm-*.out" -mmin -60'
ssh euler 'tail -100 /cluster/scratch/<user>/moleworks_ext_<timestamp>/slurm-<jobid>.out'
ssh euler 'cat /cluster/scratch/<user>/moleworks_ext_<timestamp>/slurm-<jobid>.err'
ssh euler 'tail -50 /cluster/scratch/<user>/moleworks_ext_<timestamp>/slurm-<jobid>.err'
ssh euler 'grep -n "bind mount detected" /cluster/scratch/<user>/moleworks_ext_<timestamp>/slurm-<jobid>.out | head'Note: .err is where Python tracebacks are.
Detect Time-Limit Kills Quickly
Use sacct to verify if jobs were cancelled by partition wall time:
ssh euler 'sacct -j <jobid> --format=JobID,State,Elapsed,Timelimit,Partition%12,ExitCode -P'Typical symptom:
- top-level job state is
TIMEOUT - batch step state is
CANCELLEDwith exit0:15
If that happens, rerun on a longer partition/time (for example PARTITION=gpuhe.24h JOB_TIME=24h).
6) Sync Results
./docker/cluster/sync_experiments.sh
# or to save space
./docker/cluster/sync_experiments.sh --removePreferred: Live Report + Targeted Sync
Use the local helper to avoid full log sync when you only need active run diagnostics:
# Report active runs with run_name/task/wandb/timeout/full/close + resolved run_dir
scripts/utils/cluster_run_report.sh
# Restrict to specific jobs
scripts/utils/cluster_run_report.sh --job-ids 58383883 58334585
# Report + sync only params/ for those runs
scripts/utils/cluster_run_report.sh --job-ids 58383883 58334585 --sync-paramsThen compare configs directly from synced runs:
python3 scripts/utils/compare_run_configs.py \
--run-a <run_name_or_run_dir_A> \
--run-b <run_name_or_run_dir_B>Filter to keys you care about:
python3 scripts/utils/compare_run_configs.py \
--run-a fresh24h_32k_ros_limit_margins_s43 \
--run-b 4gpu64k_24h_it10k_decim3_rtx3090 \
--contains decimation \
--contains action_noise \
--contains curriculum_end_height_above_soilUse this tool first for "why run A vs B differs?" before manual YAML inspection.
6.1) Policy Pull Protocol (Mandatory)
When new policies are pulled from cluster logs, do all of the following in order:
- Identify active cluster runs and map them to experiment/run directories.
- Sync logs/checkpoints with
sync_experiments.sh. - Benchmark the pulled checkpoints with fixed benchmark settings.
- Plot temporal training progression for core metrics before recommending a checkpoint.
Do not recommend a new checkpoint from sync alone; always include benchmark evidence.
6.2) Experiment Tracking Docs (Mandatory)
Maintain two repo docs in moleworks_ext/docs:
EXPERIMENTS_ONGOING.md: only liveRUNNING/PENDINGexperimentsEXPERIMENTS_RUN.md: archive of completed/stopped/benchmarked experiments
Hard rule:
- Every time you launch a new training/benchmark run (local or cluster), update
EXPERIMENTS_ONGOING.mdin the same work session. - Do not launch additional runs until docs are updated (name, run_name, date UTC, status, intention, path/job id).
- When a run finishes/stops/is benchmarked, move it to
EXPERIMENTS_RUN.mdbefore reporting final conclusions. - Reconcile
EXPERIMENTS_ONGOING.mdagainst live cluster state (squeue) before each monitoring report. - If a row exists in
EXPERIMENTS_ONGOING.mdbut its job id is no longer insqueue, treat it as finished by default: benchmark it, archive it inEXPERIMENTS_RUN.md, and remove it fromEXPERIMENTS_ONGOING.mdin the same session.
For every submit:
- Immediately add a row to
EXPERIMENTS_ONGOING.mdwith:- name
- training
run_name - W&B run reference (
wandb_runand URL, orNAif unavailable) - environment
- date (UTC)
- short note / intention
- job id + workdir/log path if available
- During monitoring, update status and notes in place.
- When the run is benchmarked or stopped/completed, move the row to
EXPERIMENTS_RUN.md. - In
EXPERIMENTS_RUN.md, keep at least:- name
- environment
- date (UTC)
- short note / intention
- concise result summary
- artifact path(s) (run dir / report / checkpoint)
- For cluster runs, include a benchmark summary before archiving:
- checkpoint used
- success/close/full or termination breakdown
- dominant failure mode (if any)
Rules:
- Keep timestamps absolute (UTC), not relative.
- If job args are unknown while pending, mark
TBDand resolve once stdout exists. - If duplicate submissions happen, track each job id separately and note duplication explicitly.
6.3) Reporting Format: Run Name + W&B (Mandatory)
When reporting active runs in chat or docs, always include:
run_namewandb_run(preferred: W&B run id/name)wandb_url
If W&B is not initialized or not printed in logs, report explicit fallback:
wandb_run=NAwandb_url=NA
Timeout-Dominance Snapshot Format (Mandatory)
When reporting termination-health snapshots, always include these fields per run:
job_idrun_nametask(environment id)wandb_runwandb_urllast_timeoutlast_fulllast_close
Preferred one-line format:
<job_id> | <run_name> | <task> | wandb_run=<...> | wandb_url=<...> | timeout=<...> | full=<...> | close=<...>Never report only raw job ids with metrics; include run name and task on every line.
Quick extraction from SLURM logs:
# run_name
grep -m1 -oE -- '--run_name [^ ]+' <slurm.out> | awk '{print $2}'
# W&B run URL (if present in stdout or stderr)
grep -Eo 'https://wandb.ai/[^ ]+/runs/[^ ]+' <slurm.out> <slurm.err> | tail -n16.4) What Is Logged Per Run (Current Ground Truth)
From scripts/rsl_rl/train.py, every run directory stores:
params/env.yamlparams/agent.yamlparams/env.pklparams/agent.pkl
Notes:
- Prefer
env.yaml/agent.yamlfor fast diffs (compare_run_configs.py). env.pkl/agent.pklare best when exact Python object reconstruction is needed.- Git snapshot logging is currently disabled in train script (
runner.add_git_repo_to_logis commented).
7) Play Policy Locally
/workspace/isaaclab/isaaclab.sh -p scripts/rsl_rl/play.py \
--task <TASK> \
--checkpoint logs/rsl_rl/<exp>/<run_dir>/model_150.pt \
--num_envs 1Benchmark Latest Checkpoints (Excavation3D / w-cabin)
Prefer the benchmark script for quantitative comparisons:
/workspace/isaaclab/isaaclab.sh -p scripts/mole_environments/excavation3D/benchmark_excavation.py \
--task Moleworks-Isaac-m445-digging-3D-w-cabin \
--checkpoint logs/rsl_rl/<exp>/<run_dir>/model_<N>.pt \
--num_envs 2048 \
--benchmark_steps 300Notes:
- Keep
--auto_task_from_checkpointand--sync_eval_from_checkpointenabled (defaults) to avoid task/config mismatches. - Reports are written under
<run_dir>/play_model_<N>/<timestamp>_benchmarking/benchmark_report_*.txt. - For fair cross-run comparison, keep
num_envsandbenchmark_stepsfixed.
Benchmark Multiple Checkpoints (sweep)
Use a fixed benchmark setup (same num_envs, benchmark_steps, seed) when comparing models:
RUN_DIR=logs/rsl_rl/<exp>/<run_dir>
for m in 500 750 1000 1250 1500 1750 2000 2250 2500 2750 3000 3250 3500 3750 4000 4250 4500; do
/workspace/isaaclab/isaaclab.sh -p scripts/mole_environments/excavation3D/benchmark_excavation.py \
--task Moleworks-Isaac-m445-digging-3D-w-cabin \
--checkpoint "${RUN_DIR}/model_${m}.pt" \
--num_envs 1024 \
--benchmark_steps 400 \
--seed 0
doneUse the generated benchmark_report_*.txt files under play_model_<N>/... for ranking.
8) Temporal Plots (Mandatory after Pull)
After benchmarking pulled policies, generate temporal plots from TensorBoard event files:
- core termination progression:
full,close,partial,timeout,negative - stability progression:
Train/mean_reward,Train/mean_episode_length
Use:
python3 scripts/plot_tb_scalars.py \
--run-dir logs/rsl_rl/<exp>/<run_dir> \
--plot-core \
--out-dir outputs/analysis/training_curvesThis writes:
core_progression.png(fractions + counts + reward/episode length)core_progression.csv(step-wise fractions/counts/reward)
On-Demand Plotting (Custom Terms)
To inspect arbitrary temporal metrics (for example specific termination modes), first list tags:
python3 scripts/plot_tb_scalars.py --run-dir logs/rsl_rl/<exp>/<run_dir> --list-tagsThen plot exact tags:
python3 scripts/plot_tb_scalars.py \
--run-dir logs/rsl_rl/<exp>/<run_dir> \
--tags Episode_Termination/goal_reached_full Episode_Termination/goal_reached_close Episode_Termination/time_outOr use regex selection:
python3 scripts/plot_tb_scalars.py \
--run-dir logs/rsl_rl/<exp>/<run_dir> \
--regex '^Episode_Termination/(goal_reached_full|goal_reached_close|time_out|bucket_velocity)$'Temporal progression plots are the default diagnostic source when close/full benchmark numbers look inconsistent with W&B summaries.
RL Debugging Tips
- Init order matters: buffers must exist before ObservationManager uses them.
- CurriculumManager uses
compute()(EventManager usesapply()). - Keep rewards bounded:
exp(-penalty)keeps in (0,1]. - Reference
excavation3d_w_cabinfor established API patterns. - If behavior is odd: check reward farming, velocity distribution, and speed penalties.
tmux Requirement
When running IsaacLab scripts or ROS commands, open a new tmux window so output persists:
tmux new-window -n rl-isaaclab
/workspace/isaaclab/isaaclab.sh -p scripts/rsl_rl/train.py --task <TASK> --num_envs 4
tmux capture-pane -p