neil1taylor

IBM VPC File Pool CSI Driver — Build Skill

- **Do NOT skip leader election for the controller** — two controllers allocating simultaneously will corrupt pool state.

neil1taylor 0 Updated 3mo ago
GitHub

Install

npx skillscat add neil1taylor/ibm-vpc-file-pool-csi

Install via the SkillsCat registry.

SKILL.md

IBM VPC File Pool CSI Driver — Build Skill

Overview

You are building ibm-vpc-file-pool-csi, a Kubernetes CSI driver for IBM Cloud VPC that provisions multiple PVCs as subdirectories within shared VPC file shares. This is fundamentally different from IBM's stock ibm-vpc-file-csi-driver, which creates one VPC file share per PVC.

The analogy: Think VMware — one NFS datastore holds many VMDKs. Here, one large VPC file share holds many PVC subdirectories.

Before You Start Any Task

  1. Read this file completely.
  2. Read the reference docs in this directory based on what you're working on:
    • ARCHITECTURE.md — system design, component diagram, data flow
    • CRD-SPEC.md — FileSharePool and SubVolume CRD definitions
    • CSI-INTERFACE.md — CSI gRPC method implementations with pool-aware logic
    • IBM-VPC-API.md — IBM Cloud VPC file share API usage and client wrapper
    • CODING-GUIDELINES.md — Go conventions, error handling, testing patterns
    • TESTING.md — testing strategy, fakes/mocks, test cases, coverage targets
    • API-KEY-SETUP.md — IBM Cloud API key creation, IAM permissions, rotation, security
    • INSTALL.md — build, deploy, Helm chart, verification steps
    • USER-GUIDE.md — end-user guide for pools, StorageClasses, PVCs, monitoring
  3. Check the existing codebase before writing new code. Don't duplicate what exists.

Project Structure

ibm-vpc-file-pool-csi/
├── cmd/
│   └── main.go                        # Entrypoint: parse flags, start gRPC server
├── pkg/
│   ├── driver/
│   │   ├── driver.go                  # Driver struct, gRPC server lifecycle
│   │   ├── identity.go                # CSI Identity service (GetPluginInfo, Probe)
│   │   ├── controller.go              # CSI Controller service (CreateVolume, DeleteVolume, etc.)
│   │   ├── controller_test.go         # Controller unit tests
│   │   ├── node.go                    # CSI Node service (NodePublishVolume, mount/bind)
│   │   └── node_test.go              # Node unit tests
│   ├── pool/
│   │   ├── manager.go                 # Pool manager: allocation, share selection, capacity tracking
│   │   ├── manager_test.go            # Unit tests for pool manager (87 tests)
│   │   ├── share.go                   # VPC file share lifecycle wrapper
│   │   ├── subvolume.go               # Subdirectory operations (mkdir, rm, quota)
│   │   ├── nfs.go                     # NFS operations interface
│   │   ├── reconciler.go             # Controller-runtime reconciler for FileSharePool
│   │   ├── reconciler_test.go        # Reconciler tests (26 tests)
│   │   ├── clone_worker.go           # Async clone operation handler
│   │   ├── clone_worker_test.go      # Clone worker tests (12 tests)
│   │   ├── replication_controller.go # Cross-region replication controller
│   │   └── replication_controller_test.go # Replication tests (23 tests)
│   ├── ibmcloud/
│   │   ├── client.go                  # IBM VPC client interface
│   │   ├── helpers.go                 # VPC API helper functions
│   │   ├── vpc_client.go             # IBM VPC SDK wrapper (file share CRUD)
│   │   ├── vpc_client_test.go        # Unit tests with mocked VPC API
│   │   └── fake/
│   │       ├── fake_client.go         # Fake client for testing without IBM Cloud
│   │       └── fake_client_test.go
│   ├── k8s/
│   │   ├── client.go                  # Kubernetes client interface for CRDs
│   │   ├── real_client.go            # Real Kubernetes client implementation
│   │   └── real_client_test.go
│   ├── metrics/
│   │   ├── metrics.go                 # Prometheus metric definitions
│   │   └── metrics_test.go
│   ├── migrate/
│   │   ├── executor.go               # Migration execution logic
│   │   ├── executor_test.go
│   │   ├── planner.go                # Migration planning
│   │   ├── planner_test.go
│   │   ├── pod.go                    # Pod management for migrations
│   │   └── pod_test.go
│   └── util/
│       ├── mount.go                   # NFS mount helpers, mount cache
│       ├── mount_test.go
│       ├── path.go                    # Path validation, directory traversal prevention
│       ├── path_test.go
│       ├── volume_id.go              # Volume ID parsing utilities
│       └── volume_id_test.go
├── api/
│   └── v1alpha1/
│       ├── doc.go                     # Package documentation
│       ├── groupversion_info.go       # GV registration
│       ├── filesharepool_types.go    # FileSharePool CRD Go types
│       ├── filesharepool_types_test.go
│       ├── subvolume_types.go         # SubVolume CRD Go types
│       ├── snapshot_types.go          # Snapshot CRD Go types
│       ├── volumegroupsnapshot_types.go # VolumeGroupSnapshot CRD Go types
│       ├── replicationpolicy_types.go # ReplicationPolicy CRD Go types
│       └── zz_generated.deepcopy.go   # Generated by controller-gen
├── config/
│   ├── crd/
│   │   ├── storage.ibmcloud.io_filesharepools.yaml
│   │   ├── storage.ibmcloud.io_subvolumes.yaml
│   │   ├── storage.ibmcloud.io_snapshots.yaml
│   │   ├── storage.ibmcloud.io_volumegroupsnapshots.yaml
│   │   └── storage.ibmcloud.io_replicationpolicies.yaml
│   ├── rbac/
│   │   ├── clusterrole.yaml
│   │   └── serviceaccount.yaml
│   └── deploy/
│       ├── controller.yaml            # Controller Deployment
│       ├── node.yaml                  # Node DaemonSet
│       ├── csidriver.yaml             # CSIDriver object
│       └── storageclass.yaml          # Example StorageClasses
├── charts/
│   └── ibm-vpc-file-pool-csi/
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
├── hack/
│   ├── update-codegen.sh             # CRD code generation
│   └── verify-codegen.sh
├── test/
│   ├── e2e/                          # End-to-end tests (require cluster)
│   └── integration/                  # Integration tests (in-memory fakes, no NFS server)
│       ├── capacity_management_test.go
│       ├── clone_lifecycle_test.go
│       ├── clone_worker_test.go
│       ├── concurrent_allocation_test.go
│       ├── error_recovery_test.go
│       ├── group_snapshot_test.go
│       ├── helpers_test.go
│       ├── pool_lifecycle_test.go
│       └── snapshot_lifecycle_test.go
├── Dockerfile
├── Makefile
├── go.mod
└── go.sum

Key Design Principles

1. One Share, Many PVCs

Every CreateVolume call picks an existing VPC file share from a pool and records a SubVolume CR — it does NOT create a new VPC file share. New shares are only created by the pool manager when capacity runs low.

2. State Lives in CRDs

All state (which PVC is on which share, capacity allocations, pool membership) is stored in Kubernetes CRDs (FileSharePool and SubVolume). No external database, no local files on the controller pod.

3. Node Mounts Are Cached

Each worker node mounts a VPC file share at most once. Individual PVCs are bind-mounted from subdirectories of that single NFS mount. This minimizes NFS connections.

4. Fail Safe, Not Fast

If the pool manager can't find a share with enough room and can't create a new one, CreateVolume should return a retriable gRPC error — never silently overcommit.

5. IBM VPC API Calls Are Expensive

API calls to create/expand shares take 30-90 seconds. The hot path (CreateVolume for a PVC) should almost never need one. API calls belong in the pool manager's background reconciliation loop, not in the CSI gRPC handlers.

Build & Development Commands

# Build
make build                    # Build the binary
make docker-build             # Build container image
make generate                 # Run controller-gen for CRD types

# Test
make test                     # Unit tests
make test-coverage            # Unit tests with coverage
make lint                     # golangci-lint

# Deploy
make install-crds             # Apply CRDs to cluster
make deploy                   # Deploy controller + node agent
make helm-install             # Install via Helm chart

# Development
make run-local                # Run controller locally against a cluster (dry-run mode)
make test-e2e                 # E2E tests (requires live cluster, //go:build e2e tag)

Task-Specific Guidance

When implementing CRD types

  • Read CRD-SPEC.md first
  • Use api/v1alpha1/ directory
  • Include validation markers (+kubebuilder:validation:*)
  • Always add a Status subresource
  • Run make generate after changing types

When implementing CSI Controller methods

  • Read CSI-INTERFACE.md first
  • The controller MUST be idempotent — if called twice with the same volume name, return the same result
  • CreateVolume: call pool manager → create SubVolume CR → return (no mkdir; subdirectory creation is deferred to NodePublishVolume)
  • DeleteVolume: update pool tracking → delete SubVolume CR (no subdir removal; nfsOps is nil in controller mode)
  • Never call IBM VPC API directly from CSI handlers — go through pool manager

When implementing CSI Node methods

  • Read CSI-INTERFACE.md (Node section) first
  • NodeStageVolume: mount the whole NFS share if not already mounted
  • NodePublishVolume: create the subdirectory if it does not exist (with uid/gid/permissions from VolumeContext), then bind-mount it into the pod path
  • Track active mounts in memory with a sync.RWMutex-protected map
  • Always validate mount paths to prevent directory traversal

When implementing the Pool Manager

  • Read ARCHITECTURE.md (Pool Manager section) first
  • This is the brain of the system
  • It runs as a controller-runtime reconciler watching FileSharePool CRs
  • It also exposes a synchronous Allocate(ctx, poolName, sizeGB) method for the CSI controller
  • Uses optimistic locking on CRD status updates to prevent races

When implementing the IBM Cloud VPC client

  • Read IBM-VPC-API.md first
  • Always use the vpc-go-sdk — never raw HTTP
  • All API calls must have context-based timeouts (2 minutes max)
  • Always implement a fake client for testing

When writing tests

  • Read CODING-GUIDELINES.md (Testing section) first
  • Unit tests go next to the code (_test.go)
  • Use table-driven tests
  • Mock IBM Cloud API with fake client
  • Mock Kubernetes API with fake.NewClientBuilder()
  • The pool manager needs thorough tests: allocation under pressure, concurrent requests, share exhaustion

When writing Kubernetes manifests

  • Controller runs as a Deployment (1-2 replicas, leader election)
  • Node agent runs as a DaemonSet (needs hostNetwork: false, hostPID: true — required for nsenter mount wrapper to access host mount namespace for NFS mounts)
  • Node agent needs /var/lib/kubelet mounted for bind-mounts
  • RBAC must cover: FileSharePool, SubVolume (get/list/watch/create/update/patch), PVs, PVCs, Secrets, ConfigMaps, Events, CSINode, CSIDriver

When implementing snapshots (Phase 4a)

  • Read CSI-INTERFACE.md (Snapshot section) and VOLUME-GROUP-SNAPSHOTS.md
  • Snapshots are directory copies under /pvcs/.snapshots/{snap-name}/ using NFSOperations.CopyDir
  • The Snapshot CRD tracks each snapshot; the pool manager creates and deletes them
  • Restore from snapshot uses RestoreSnapshot which creates a new SubVolume from the snapshot data
  • All snapshot operations are synchronous (unlike clones)

When implementing volume cloning (Phase 4b)

  • Read VOLUME-CLONING.md first
  • Clones use PoolManager.CloneVolume() — sync for small volumes, async for large
  • The clone worker (pkg/pool/clone_worker.go) handles async clones in the background
  • NodePublishVolume gates pod access on cloneStatus=Complete — pods wait until clone finishes
  • Share selection prefers the source share when it has capacity (same-share clone is faster)

When implementing group snapshots (Phase 4c)

  • Read VOLUME-GROUP-SNAPSHOTS.md first
  • Group snapshots reuse the Phase 4a single-snapshot infrastructure
  • The CSI controller handles hook orchestration (pre/post quiesce hooks)
  • The pool manager handles only the data plane (creating/deleting snapshot directories)
  • Failure policy: Abort rolls back all completed snapshots; Continue marks as PartialFailure

When implementing replication (Phase 4d)

  • Read CROSS-REGION-DR.md first
  • The replication controller (pkg/pool/replication_controller.go) runs as a separate reconciler
  • It copies SubVolume data between pools using CopyDir, not rsync (simplified from design doc)
  • Uses time.Duration schedule intervals, not cron expressions (simplified from design doc)
  • Destination is specified via DestinationNFSServer IP, not a pool reference (simplified from design doc)

When implementing migration (pkg/migrate/)

  • The planner analyzes existing stock IBM CSI PVCs and generates a migration plan
  • The executor creates SubVolume CRs, spawns data-copy pods, and rebinds PVCs
  • All migration operations are idempotent and can be resumed after failure

Common Mistakes to Avoid

  • Do NOT create a VPC file share in CreateVolume — that's the pool manager's job during reconciliation.
  • Do NOT store state in ConfigMaps — use the CRDs. ConfigMaps have size limits and no status subresource.
  • Do NOT assume subdirectory quotas are enforced — NFS doesn't enforce per-directory quotas natively. Track allocations in SubVolume CRs and report via metrics. Hard enforcement is a future feature.
  • Do NOT mount NFS shares with hard mount option in production — use soft,timeo=600,retrans=3 so pods don't hang indefinitely on NFS failures.
  • Do NOT use os.RemoveAll for PVC cleanup without checking the path — validate that the path is within the expected share mount point. Directory traversal bugs in a CSI driver are catastrophic.
  • Do NOT skip leader election for the controller — two controllers allocating simultaneously will corrupt pool state.