elie222

llm-test

Guidelines for writing tests for LLM-related functionality

elie222 11,092 1,363 Updated 3mo ago
GitHub

Install

npx skillscat add elie222/inbox-zero/llm-test

Install via the SkillsCat registry.

SKILL.md

LLM Testing Guidelines

Tests for LLM-related functionality should follow these guidelines to ensure consistency and reliability.

Test File Structure

  1. Place all LLM-related tests in apps/web/__tests__/:

    apps/web/__tests__/
    │ └── your-feature.test.ts
    │ └── another-feature.test.ts
    └── ...
  2. Basic test file template:

    import { describe, expect, test, vi, beforeEach } from "vitest";
    import { yourFunction } from "@/utils/ai/your-feature";
    
    // Run with: pnpm test-ai TEST
    
    vi.mock("server-only", () => ({}));
    
    const TIMEOUT = 15_000;
    
    // Skip tests unless explicitly running AI tests
    const isAiTest = process.env.RUN_AI_TESTS === "true";
    
    describe.runIf(isAiTest)("yourFunction", () => {
      beforeEach(() => {
        vi.clearAllMocks();
      });
    
      test("test case description", async () => {
        // Test implementation
      });
    }, TIMEOUT);

Helper Functions

  1. Always create helper functions for common test data:

    function getUser() {
      return {
        email: "user@test.com",
        aiModel: null,
        aiProvider: null,
        aiApiKey: null,
        about: null,
      };
    }
    
    function getTestData(overrides = {}) {
      return {
        // Default test data
        ...overrides,
      };
    }

Test Cases

  1. Include these standard test cases:

    • Happy path with expected input
    • Error handling
    • Edge cases (empty input, null values)
    • Different user configurations
    • Various input formats
  2. Example test structure:

    test("successfully processes valid input", async () => {
      const result = await yourFunction({
        input: getTestData(),
        user: getUser(),
      });
      expect(result).toMatchExpectedFormat();
    });
    
    test("handles errors gracefully", async () => {
      const result = await yourFunction({
        input: getTestData({ invalid: true }),
        user: getUser(),
      });
      expect(result.error).toBeDefined();
    });

Best Practices

  1. Set appropriate timeouts for LLM calls:

    const TIMEOUT = 15_000;
    test("handles long-running LLM operations", async () => {
      // ...
    }, TIMEOUT);
  2. Use descriptive console.debug for generated content:

    console.debug("Generated content:\n", result.content);
  3. Do not mock the LLM call. We want to call the actual LLM in these tests.

  4. Test both AI and non-AI paths:

    test("returns unchanged when no AI processing needed", async () => {
      const input = getTestData({ requiresAi: false });
      const result = await yourFunction(input);
      expect(result).toEqual(input);
    });
  5. Use existing helpers from @/__tests__/helpers.ts:

  • getEmailAccount(overrides?) - Creates EmailAccountWithAI objects
  • getEmail(overrides?) - Creates EmailForLLM objects
  • getRule(instructions, actions?) - Creates rule objects
  • getMockMessage(options?) - Creates mock message objects
  • getMockExecutedRule(options?) - Creates executed rule objects

Always prefer using existing helpers over creating custom ones.

Running Tests

Run AI tests with:

pnpm test-ai your-feature