Coowoolf

Constitutional AI

105 Product Management Skills extracted from Lenny's Podcast - For use with Claude Code / Cursor / Windsurf

Coowoolf 2 Updated 4mo ago
GitHub

Install

npx skillscat add coowoolf/insighthunt-skills/constitutional-ai

Install via the SkillsCat registry.

SKILL.md

Constitutional AI

Constitutional AI(宪法 AI)

概述 / Overview

一种 AI 模型对齐方法,通过 AI 反馈(RLAIF)训练模型遵循一套自然语言原则(即“宪法”),而非单纯依赖人工标注。

来源 / Source

  • 嘉宾: Benjamin Mann
  • 职位: Co-founder
  • 公司: Anthropic

核心步骤 / Core Steps

  1. Define Constitution (Values)
  2. Model Generates Response
  3. Model Self-Critiques against Constitution
  4. Model Rewrites Response
  5. Fine-Tune on Revised Data

核心原则 / Core Principles

  • Define Principles: Establish a constitution of values (e.g., helpful, harmless, honest, human rights).
  • Generate & Critique: The model generates a response, then critiques itself based on the constitution.
  • Recursive Revision: If the response violates principles, the model rewrites it.
  • Supervised Learning: The model is fine-tuned on these revised, compliant outputs.

适用场景 / When to Use

在训练大语言模型 (LLM) 时,确保其遵循复杂的人类价值观与安全准则。

常见错误 / Common Mistakes

过度依赖简单的用户反馈(RLHF)会导致模型产生讨好倾向;未能定义明确的价值准则。

实战案例 / Real-World Example

Anthropic 采用该方法训练 Claude,其中融合了《联合国人权宣言》及其他来源的原则。

金句 / Quote

"First we figure out which ones might apply... then we ask the model itself to critique itself and rewrite its own response in light of the principle."