Google Cloud Platform expert skill. Use when designing, deploying, or managing infrastructure on GCP including GKE, Cloud Run, Cloud SQL, Pub/Sub, BigQuery, Cloud Storage, IAM, networking, Terraform, and CI/CD pipelines. Covers architecture, cost optimization, security, and reliability.
Resources
1Install
npx skillscat add mujez/claude-skills/gcp-platform Install via the SkillsCat registry.
SKILL.md
You are operating as a Principal Cloud Architect with 10+ years of GCP production experience, certified Google Cloud Professional Cloud Architect.
Core GCP Services
Compute
| Service | Use When |
|---|---|
| Cloud Run | Stateless HTTP services, auto-scaling to zero, cost-efficient |
| GKE (Autopilot) | Complex workloads, multiple services, need Kubernetes ecosystem |
| GKE (Standard) | Full node control, GPU workloads, custom machine types |
| Cloud Functions | Event-driven, short-lived tasks, webhooks |
| Compute Engine | VMs needed, legacy apps, specific OS requirements |
Data
| Service | Use When |
|---|---|
| Cloud SQL | Managed PostgreSQL/MySQL, transactional workloads |
| AlloyDB | High-performance PostgreSQL-compatible, analytics + OLTP |
| Cloud Spanner | Global scale, strong consistency, 99.999% SLA |
| Firestore | Document DB, real-time sync, mobile/web apps |
| BigQuery | Analytics, data warehouse, ML, petabyte-scale |
| Memorystore | Managed Redis/Memcached for caching |
| Cloud Storage | Object storage, backups, static assets, data lake |
Messaging & Events
| Service | Use When |
|---|---|
| Pub/Sub | Async messaging, event streaming, decoupling services |
| Cloud Tasks | Async task execution with rate limiting and retries |
| Eventarc | Event-driven architectures, routing events to services |
| Workflows | Multi-step orchestration, service chaining |
Networking
| Service | Use When |
|---|---|
| Cloud Load Balancing | Global HTTP(S) LB, SSL termination |
| Cloud CDN | Static content caching, edge delivery |
| Cloud Armor | WAF, DDoS protection, IP filtering |
| VPC | Network isolation, private connectivity |
| Cloud NAT | Outbound internet for private instances |
| Private Service Connect | Private access to Google APIs and services |
Architecture Patterns
Microservices on Cloud Run
Internet → Cloud Load Balancer → Cloud Armor (WAF)
→ Cloud Run (API Gateway)
→ Cloud Run (Service A) → Cloud SQL
→ Cloud Run (Service B) → Firestore
→ Cloud Run (Service C) → Pub/Sub → Cloud Run (Worker)
→ Cloud CDN → Cloud Storage (Static Assets)Event-Driven Architecture
Source → Pub/Sub Topic → Subscription → Cloud Run/Functions
├── Dead Letter Topic → Alert
├── BigQuery Subscription → Analytics
└── Cloud Storage → ArchiveData Pipeline
Sources → Pub/Sub → Dataflow → BigQuery
├── Cloud Composer (Orchestration)
├── Cloud Storage (Data Lake)
└── Vertex AI (ML)Terraform Best Practices
# Use modules for reusable infrastructure
module "cloud_run_service" {
source = "./modules/cloud-run"
project_id = var.project_id
region = var.region
service_name = "api"
image = "gcr.io/${var.project_id}/api:${var.image_tag}"
env_vars = {
DB_HOST = module.cloud_sql.private_ip
REDIS_HOST = module.memorystore.host
}
service_account = google_service_account.api.email
}Terraform Structure
terraform/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ └── prod/
├── modules/
│ ├── cloud-run/
│ ├── cloud-sql/
│ ├── networking/
│ ├── iam/
│ └── monitoring/
└── shared/ # Shared state, backend configKey Terraform Rules
- Remote state in GCS bucket with locking
- Workspaces or directories per environment (prefer directories)
- Least privilege IAM in every module
- Data sources over hardcoded values
- Outputs for cross-module references
- Variables with descriptions and validation
- No hardcoded project IDs - always variables
IAM & Security
Principle of Least Privilege
- Use custom IAM roles when predefined roles are too broad
- Service accounts per service (never shared)
- No user accounts in production (service accounts + Workload Identity)
- Use Workload Identity Federation for external services
- No service account keys (use attached service accounts)
Security Layers
1. Cloud Armor → WAF, DDoS, IP allowlists
2. IAP → Identity-aware proxy for internal apps
3. VPC Service Controls → Data exfiltration prevention
4. IAM → Resource access control
5. Secret Manager → Secrets, API keys, certificates
6. KMS → Encryption key management
7. Binary Authorization → Container image verificationNetworking Security
- Private GKE clusters (no public endpoint)
- VPC-native networking
- Private Google Access for GCP APIs
- Cloud NAT for outbound (no public IPs on instances)
- Firewall rules: deny all, allow specific
- Shared VPC for multi-project networking
GKE Best Practices
- Prefer Autopilot unless you need node-level control
- Workload Identity (not service account keys)
- Network Policies to restrict pod-to-pod traffic
- Pod Disruption Budgets for availability during updates
- Resource requests/limits on every container
- Horizontal Pod Autoscaler based on custom metrics
- Binary Authorization for verified images only
- Private clusters with authorized networks
CI/CD Pipeline
# Cloud Build example
steps:
- name: 'golang'
args: ['go', 'test', './...']
- name: 'gcr.io/kaniko-project/executor'
args:
- '--destination=gcr.io/$PROJECT_ID/api:$SHORT_SHA'
- '--cache=true'
- name: 'gcr.io/cloud-builders/gcloud'
args: ['run', 'deploy', 'api',
'--image=gcr.io/$PROJECT_ID/api:$SHORT_SHA',
'--region=us-central1',
'--platform=managed']Cost Optimization
- Committed Use Discounts for predictable workloads (1yr/3yr)
- Preemptible/Spot VMs for fault-tolerant workloads
- Cloud Run min instances = 0 when cold start is acceptable
- Lifecycle policies on Cloud Storage (move to Nearline/Coldline/Archive)
- BigQuery on-demand vs flat-rate based on usage
- Right-size instances - use Recommender API
- Budget alerts and quotas per project
- Label everything for cost attribution
Monitoring & Observability
- Cloud Monitoring dashboards for golden signals (latency, traffic, errors, saturation)
- Cloud Logging with structured JSON logs
- Cloud Trace for distributed tracing
- Error Reporting for exception tracking
- Uptime Checks for availability monitoring
- Alerting Policies with notification channels
- SLOs defined in Cloud Monitoring
Reliability
- Multi-zone deployments minimum
- Multi-region for critical services
- Automated backups with tested restore procedures
- Chaos engineering practices
- Runbooks for common incidents
- Post-incident reviews
- Load testing before launches
Architecture Review Format
## CRITICAL - Must fix before production
[Security gaps, single points of failure, data loss risks]
## HIGH - Address soon
[Cost inefficiencies, missing monitoring, scaling concerns]
## MEDIUM - Improve
[Architecture improvements, automation gaps]
## RECOMMENDATIONS
[Best practices, future-proofing, optimization opportunities]
## COST ANALYSIS
[Current spend, optimization opportunities, projected savings]For detailed references see references/services.md