Q-Value Augmented Vector Retrieval for learned memory ranking. Tracks which memories are most useful over time and prioritizes them in retrieval.
Install
npx skillscat add kimasplund/clawdbot-skills-pack/qavr-memory Install via the SkillsCat registry.
SKILL.md
QAVR - Q-Value Augmented Vector Retrieval
Memory system that learns which information is most useful over time.
Concept
Standard vector retrieval returns results by semantic similarity alone. QAVR adds learned utility scoring based on actual usage outcomes:
Final Score = (1 - α) × Semantic Similarity + α × Q-ValueWhere:
- Semantic Similarity: How relevant the memory is to the query
- Q-Value: Learned utility score (0.0 - 1.0) based on past usefulness
- α: Blending factor (increases as more data collected)
How It Works
Cold Context (< 100 interactions)
- Pure semantic similarity (α = 0)
- Q-values being collected but not used
- Learning phase
Warm Context (≥ 100 interactions)
- Q-value re-ranking active (α = 0.3)
- Memories that led to successful outcomes ranked higher
- Continuous learning from feedback
Q-Value Updates
After each interaction:
# Positive outcome (task succeeded, user satisfied)
q_new = q_old + learning_rate * (reward - q_old)
reward = 1.0 for success, 0.0 for failure
# Temporal decay (unused memories fade)
q_decayed = q_old * decay_factor # e.g., 0.99 per dayImplementation
Storage Format
{
"memories": {
"memory_id_1": {
"q_value": 0.75,
"access_count": 12,
"last_accessed": "2026-01-26",
"success_count": 9,
"failure_count": 3
}
},
"contexts": {
"debugging": {"interactions": 82, "mode": "cold"},
"coding": {"interactions": 156, "mode": "warm"}
},
"config": {
"learning_rate": 0.1,
"decay_factor": 0.99,
"warm_threshold": 100
}
}Integration with Vector DB
def qavr_query(query_text, collection, n_results=5):
# Get semantic results
results = collection.query(
query_texts=[query_text],
n_results=n_results * 2 # Over-fetch for re-ranking
)
# Apply Q-value re-ranking if warm
if context_is_warm():
results = rerank_by_qvalue(results, alpha=0.3)
return results[:n_results]Feedback Signals
QAVR learns from implicit signals:
| Signal | Interpretation | Reward |
|---|---|---|
| Memory used in successful task | Highly useful | +1.0 |
| Memory retrieved but not used | Somewhat relevant | +0.1 |
| Memory retrieved, task failed | Possibly misleading | -0.2 |
| Memory not retrieved for days | Decaying relevance | decay |
Benefits
- Personalization: Learns YOUR usage patterns
- Noise Reduction: Unhelpful memories sink to bottom
- Efficiency: Most useful info surfaces first
- Adaptation: Adjusts as your needs change
Configuration
{
"qavr": {
"enabled": true,
"learning_rate": 0.1,
"decay_factor": 0.99,
"warm_threshold": 100,
"alpha_warm": 0.3,
"contexts": ["debugging", "coding", "research"]
}
}Monitoring
Check QAVR status:
- Total memories tracked
- Context modes (cold/warm)
- Top Q-value memories
- Learning progress