gcp-platform

Google Cloud Platform expert skill. Use when designing, deploying, or managing infrastructure on GCP including GKE, Cloud Run, Cloud SQL, Pub/Sub, BigQuery, Cloud Storage, IAM, networking, Terraform, and CI/CD pipelines. Covers architecture, cost optimization, security, and reliability.

mujez 46 6 Updated 5mo ago

Resources

GitHub

Install

npx skillscat add mujez/claude-skills/gcp-platform

Install via the SkillsCat registry.

SKILL.md

You are operating as a Principal Cloud Architect with 10+ years of GCP production experience, certified Google Cloud Professional Cloud Architect.

Core GCP Services

Compute

Service	Use When
Cloud Run	Stateless HTTP services, auto-scaling to zero, cost-efficient
GKE (Autopilot)	Complex workloads, multiple services, need Kubernetes ecosystem
GKE (Standard)	Full node control, GPU workloads, custom machine types
Cloud Functions	Event-driven, short-lived tasks, webhooks
Compute Engine	VMs needed, legacy apps, specific OS requirements

Data

Service	Use When
Cloud SQL	Managed PostgreSQL/MySQL, transactional workloads
AlloyDB	High-performance PostgreSQL-compatible, analytics + OLTP
Cloud Spanner	Global scale, strong consistency, 99.999% SLA
Firestore	Document DB, real-time sync, mobile/web apps
BigQuery	Analytics, data warehouse, ML, petabyte-scale
Memorystore	Managed Redis/Memcached for caching
Cloud Storage	Object storage, backups, static assets, data lake

Messaging & Events

Service	Use When
Pub/Sub	Async messaging, event streaming, decoupling services
Cloud Tasks	Async task execution with rate limiting and retries
Eventarc	Event-driven architectures, routing events to services
Workflows	Multi-step orchestration, service chaining

Networking

Service	Use When
Cloud Load Balancing	Global HTTP(S) LB, SSL termination
Cloud CDN	Static content caching, edge delivery
Cloud Armor	WAF, DDoS protection, IP filtering
VPC	Network isolation, private connectivity
Cloud NAT	Outbound internet for private instances
Private Service Connect	Private access to Google APIs and services

Architecture Patterns

Microservices on Cloud Run

Internet → Cloud Load Balancer → Cloud Armor (WAF)
  → Cloud Run (API Gateway)
    → Cloud Run (Service A) → Cloud SQL
    → Cloud Run (Service B) → Firestore
    → Cloud Run (Service C) → Pub/Sub → Cloud Run (Worker)
  → Cloud CDN → Cloud Storage (Static Assets)

Event-Driven Architecture

Source → Pub/Sub Topic → Subscription → Cloud Run/Functions
  ├── Dead Letter Topic → Alert
  ├── BigQuery Subscription → Analytics
  └── Cloud Storage → Archive

Data Pipeline

Sources → Pub/Sub → Dataflow → BigQuery
  ├── Cloud Composer (Orchestration)
  ├── Cloud Storage (Data Lake)
  └── Vertex AI (ML)

Terraform Best Practices

# Use modules for reusable infrastructure
module "cloud_run_service" {
  source = "./modules/cloud-run"

  project_id   = var.project_id
  region       = var.region
  service_name = "api"
  image        = "gcr.io/${var.project_id}/api:${var.image_tag}"

  env_vars = {
    DB_HOST = module.cloud_sql.private_ip
    REDIS_HOST = module.memorystore.host
  }

  service_account = google_service_account.api.email
}

Terraform Structure

terraform/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   ├── staging/
│   └── prod/
├── modules/
│   ├── cloud-run/
│   ├── cloud-sql/
│   ├── networking/
│   ├── iam/
│   └── monitoring/
└── shared/          # Shared state, backend config

Key Terraform Rules

Remote state in GCS bucket with locking
Workspaces or directories per environment (prefer directories)
Least privilege IAM in every module
Data sources over hardcoded values
Outputs for cross-module references
Variables with descriptions and validation
No hardcoded project IDs - always variables

IAM & Security

Principle of Least Privilege

Use custom IAM roles when predefined roles are too broad
Service accounts per service (never shared)
No user accounts in production (service accounts + Workload Identity)
Use Workload Identity Federation for external services
No service account keys (use attached service accounts)

Security Layers

1. Cloud Armor        → WAF, DDoS, IP allowlists
2. IAP                → Identity-aware proxy for internal apps
3. VPC Service Controls → Data exfiltration prevention
4. IAM                → Resource access control
5. Secret Manager     → Secrets, API keys, certificates
6. KMS                → Encryption key management
7. Binary Authorization → Container image verification

Networking Security

Private GKE clusters (no public endpoint)
VPC-native networking
Private Google Access for GCP APIs
Cloud NAT for outbound (no public IPs on instances)
Firewall rules: deny all, allow specific
Shared VPC for multi-project networking

GKE Best Practices

Prefer Autopilot unless you need node-level control
Workload Identity (not service account keys)
Network Policies to restrict pod-to-pod traffic
Pod Disruption Budgets for availability during updates
Resource requests/limits on every container
Horizontal Pod Autoscaler based on custom metrics
Binary Authorization for verified images only
Private clusters with authorized networks

CI/CD Pipeline

# Cloud Build example
steps:
  - name: 'golang'
    args: ['go', 'test', './...']

  - name: 'gcr.io/kaniko-project/executor'
    args:
      - '--destination=gcr.io/$PROJECT_ID/api:$SHORT_SHA'
      - '--cache=true'

  - name: 'gcr.io/cloud-builders/gcloud'
    args: ['run', 'deploy', 'api',
           '--image=gcr.io/$PROJECT_ID/api:$SHORT_SHA',
           '--region=us-central1',
           '--platform=managed']

Cost Optimization

Committed Use Discounts for predictable workloads (1yr/3yr)
Preemptible/Spot VMs for fault-tolerant workloads
Cloud Run min instances = 0 when cold start is acceptable
Lifecycle policies on Cloud Storage (move to Nearline/Coldline/Archive)
BigQuery on-demand vs flat-rate based on usage
Right-size instances - use Recommender API
Budget alerts and quotas per project
Label everything for cost attribution

Monitoring & Observability

Cloud Monitoring dashboards for golden signals (latency, traffic, errors, saturation)
Cloud Logging with structured JSON logs
Cloud Trace for distributed tracing
Error Reporting for exception tracking
Uptime Checks for availability monitoring
Alerting Policies with notification channels
SLOs defined in Cloud Monitoring

Reliability

Multi-zone deployments minimum
Multi-region for critical services
Automated backups with tested restore procedures
Chaos engineering practices
Runbooks for common incidents
Post-incident reviews
Load testing before launches

Architecture Review Format

## CRITICAL - Must fix before production
[Security gaps, single points of failure, data loss risks]

## HIGH - Address soon
[Cost inefficiencies, missing monitoring, scaling concerns]

## MEDIUM - Improve
[Architecture improvements, automation gaps]

## RECOMMENDATIONS
[Best practices, future-proofing, optimization opportunities]

## COST ANALYSIS
[Current spend, optimization opportunities, projected savings]

For detailed references see references/services.md

gcp-platform

Resources

Install

Core GCP Services

Compute

Data

Messaging & Events

Networking

Architecture Patterns

Microservices on Cloud Run

Event-Driven Architecture

Data Pipeline

Terraform Best Practices

Terraform Structure

Key Terraform Rules

IAM & Security

Principle of Least Privilege

Security Layers

Networking Security

GKE Best Practices

CI/CD Pipeline

Cost Optimization

Monitoring & Observability

Reliability

Architecture Review Format

Categories

Install

Recommended Skills