llm-test

Guidelines for writing tests for LLM-related functionality

elie222 11,685 1,447 Updated 5mo ago

GitHub

Install

npx skillscat add elie222/inbox-zero/llm-test

Install via the SkillsCat registry.

SKILL.md

LLM Testing Guidelines

Tests for LLM-related functionality should follow these guidelines to ensure consistency and reliability.

Test File Structure

Place all LLM-related tests in apps/web/__tests__/:

apps/web/__tests__/
│ └── your-feature.test.ts
│ └── another-feature.test.ts
└── ...

Basic test file template:

import { describe, expect, test, vi, beforeEach } from "vitest";
import { yourFunction } from "@/utils/ai/your-feature";

// Run with: pnpm test-ai TEST

vi.mock("server-only", () => ({}));

const TIMEOUT = 15_000;

// Skip tests unless explicitly running AI tests
const isAiTest = process.env.RUN_AI_TESTS === "true";

describe.runIf(isAiTest)("yourFunction", () => {
  beforeEach(() => {
    vi.clearAllMocks();
  });

  test("test case description", async () => {
    // Test implementation
  });
}, TIMEOUT);

Helper Functions

Always create helper functions for common test data:

function getUser() {
  return {
    email: "user@test.com",
    aiModel: null,
    aiProvider: null,
    aiApiKey: null,
    about: null,
  };
}

function getTestData(overrides = {}) {
  return {
    // Default test data
    ...overrides,
  };
}

Test Cases

Include these standard test cases:
- Happy path with expected input
- Error handling
- Edge cases (empty input, null values)
- Different user configurations
- Various input formats

Example test structure:

test("successfully processes valid input", async () => {
  const result = await yourFunction({
    input: getTestData(),
    user: getUser(),
  });
  expect(result).toMatchExpectedFormat();
});

test("handles errors gracefully", async () => {
  const result = await yourFunction({
    input: getTestData({ invalid: true }),
    user: getUser(),
  });
  expect(result.error).toBeDefined();
});

Best Practices

Set appropriate timeouts for LLM calls:

const TIMEOUT = 15_000;
test("handles long-running LLM operations", async () => {
  // ...
}, TIMEOUT);

Use descriptive console.debug for generated content:

console.debug("Generated content:\n", result.content);

Do not mock the LLM call. We want to call the actual LLM in these tests.

Test both AI and non-AI paths:

test("returns unchanged when no AI processing needed", async () => {
  const input = getTestData({ requiresAi: false });
  const result = await yourFunction(input);
  expect(result).toEqual(input);
});

Use existing helpers from @/__tests__/helpers.ts:

getEmailAccount(overrides?) - Creates EmailAccountWithAI objects
getEmail(overrides?) - Creates EmailForLLM objects
getRule(instructions, actions?) - Creates rule objects
getMockMessage(options?) - Creates mock message objects
getMockExecutedRule(options?) - Creates executed rule objects

Always prefer using existing helpers over creating custom ones.

Running Tests

Run AI tests with:

pnpm test-ai your-feature

Install

npx skillscat add elie222/inbox-zero/llm-test

Install via the SkillsCat registry.

Repository Stars

The star count shown is for the parent repository (elie222/inbox-zero), not this specific skill.

.claude/skills/llm-test

llm-test

Install

LLM Testing Guidelines

Test File Structure

Helper Functions

Test Cases

Best Practices

Running Tests

Categories

Install

Repository Stars

Recommended Skills