portolan-sdi

portolan-cli SKILL.md

```

portolan-sdi 8 2 Updated 4d ago

Resources

14
GitHub

Install

npx skillscat add portolan-sdi/portolan-cli

Install via the SkillsCat registry.

SKILL.md

portolan-cli SKILL.md

This file helps AI agents assist users with Portolan CLI tasks.

What is Portolan?

Portolan CLI - Publish and manage cloud-native geospatial data catalogs.

Portolan is a CLI for publishing and managing cloud-native geospatial data catalogs. It orchestrates format conversion (GeoParquet, COG), versioning, and sync to object storage (S3, GCS, Azure)—no running servers, just static files.

Key concepts:

  • STAC (SpatioTemporal Asset Catalog) — The catalog metadata spec
  • GeoParquet — Cloud-optimized vector data (columnar, spatial indexing)
  • COG (Cloud-Optimized GeoTIFF) — Cloud-optimized raster data (HTTP range requests)
  • versions.json — Single source of truth for version history, sync state, and checksums

CLI Commands

portolan add

Track files in the catalog.

portolan add demographics/census.parquet
portolan add file1.geojson file2.geojson   # Add multiple files
portolan add imagery/                      # Add all files in directory
portolan add .                             # Add all files in catalog

portolan check

Validate a Portolan catalog or check files for cloud-native status.

portolan check                        # Validate all (metadata + geo-assets)
portolan check --metadata             # Validate metadata only
portolan check --geo-assets           # Check geo-assets only
portolan check --fix                  # Fix both metadata and geo-assets

portolan clean

Remove all Portolan metadata while preserving data files.

portolan clean           # Remove all metadata
portolan clean --dry-run # Preview what would be removed

portolan clone

Clone a remote catalog to a local directory.

portolan clone s3://mybucket/my-catalog
portolan clone s3://mybucket/my-catalog .
portolan clone s3://mybucket/catalog -c demographics
portolan clone s3://mybucket/catalog ./local-copy

portolan config

Manage catalog configuration.

portolan config set backend iceberg
portolan config get remote
portolan config list

portolan extract

Extract data from external sources into Portolan catalogs.

portolan extract arcgis https://services.arcgis.com/.../FeatureServer ./output
portolan extract arcgis URL --layers "Census*" --dry-run
portolan extract arcgis URL --filter "sdn_*" --resume

portolan info

Show information about a file, collection, or catalog.

portolan info demographics/census.parquet  # File info
portolan info demographics/                # Collection info
portolan info                              # Catalog info
portolan info demographics/census.parquet --json  # JSON output

portolan init

Initialize a new Portolan catalog.

portolan init                       # Initialize in current directory
portolan init --auto                # Skip prompts, use defaults
portolan init --title "My Catalog"  # Set title
portolan init /path/to/data --auto  # Initialize in specific directory

portolan list

List all files in the catalog with tracking status.

portolan list                           # List all files with status
portolan list --collection demographics # Filter by collection
portolan list --tracked-only            # Show only tracked files
portolan list --untracked-only          # Show only untracked files

portolan metadata

Manage catalog metadata for README generation.

portolan metadata init                # Create template at catalog root
portolan metadata init demographics   # Create template for collection
portolan metadata validate            # Validate metadata.yaml

portolan partition

Partition a large GeoParquet file for better query performance.

portolan partition buildings.parquet --preview
portolan partition buildings.parquet output/
portolan partition buildings.parquet output/ --target-rows 50000

portolan pull

Pull updates from a remote catalog.

portolan pull s3://mybucket/my-catalog --collection demographics
portolan pull s3://mybucket/catalog -c imagery --dry-run
portolan pull s3://mybucket/catalog
portolan pull s3://mybucket/catalog --workers 4

portolan push

Push local catalog changes to cloud object storage.

portolan push s3://mybucket/catalog --collection demographics
portolan push gs://mybucket/catalog -c imagery --dry-run
portolan push s3://mybucket/catalog
portolan push --dry-run  # Uses configured remote

portolan readme

Generate README.md from STAC metadata and metadata.yaml.

portolan readme                        # Generate for catalog and all collections
portolan readme climate                # Generate under climate/
portolan readme --check                # CI mode: exit 1 if any stale
portolan readme --no-recursive         # Only at catalog root

portolan rm

Remove files from tracking.

portolan rm --keep imagery/old_data.tif     # Safe: untrack only
portolan rm --dry-run vectors/              # Preview what would be removed
portolan rm -f demographics/census.parquet  # Force delete and untrack
portolan rm -f vectors/                     # Force remove entire directory

portolan scan

Scan a directory for geospatial files and potential issues.

portolan scan                         # Scan current directory
portolan scan --json                  # JSON output in current directory
portolan scan /data/geospatial
portolan scan /large/tree --max-depth=2

portolan skills

List and view AI skills for Portolan workflows.

portolan skills list              # List available skills
portolan skills show sourcecoop   # View Source Co-op upload skill

portolan stac-geoparquet

Generate items.parquet for efficient STAC queries.

portolan stac-geoparquet                    # Generate for ALL collections
portolan stac-geoparquet -c landsat         # Generate for landsat collection
portolan stac-geoparquet -c imagery --dry-run  # Preview without creating
portolan stac-geoparquet --json             # JSON output for all collections

portolan status

Show local vs remote version state for collections.

portolan status                    # Status for all collections
portolan status -c demographics    # Status for one collection
portolan status --offline          # Skip remote check
portolan status --json             # JSON output for agents

portolan sync

Sync local catalog with remote storage (pull + push).

portolan sync s3://mybucket/catalog --collection demographics
portolan sync s3://mybucket/catalog -c imagery --dry-run
portolan sync s3://mybucket/catalog -c data --fix --force
portolan sync s3://mybucket/catalog -c data --profile prod

portolan version

Version management commands.

Python API

Portolan exposes a Python API for programmatic access:

from portolan_cli import Catalog, FormatType, detect_format

# Initialize a catalog
catalog = Catalog("/path/to/data")

# Detect file format
format_type = detect_format("data.parquet")  # Returns FormatType.GEOPARQUET

Public exports:

  • Catalog - A Portolan catalog backed by a .portolan directory.
  • CatalogExistsError - Raised when attempting to initialize a catalog that already exists.
  • FormatType - Detected format type for routing to conversion library.
  • cli - Portolan - Publish and manage cloud-native geospatial data catalogs.
  • detect_format - Detect whether a file is vector, raster, or unknown.

Common Workflows

Publishing a New Catalog

  1. Initialize the catalog structure:

    portolan init --title "My Geospatial Data"
  2. Scan directory for files and fix filename issues:

    portolan scan /data/geospatial
    # Fix filename issues (invalid chars, reserved names, long paths)
    portolan scan /data/geospatial --fix
  3. Check cloud-native compliance and convert:

    portolan check --geo-assets --fix --dry-run  # Preview
    portolan check --geo-assets --fix            # Convert
  4. Track files in the catalog:

    portolan add demographics/
    portolan add imagery/
  5. Push to cloud storage:

    portolan push s3://mybucket/my-catalog --collection demographics

Updating an Existing Catalog

  1. Pull latest from remote:

    portolan pull s3://mybucket/my-catalog --collection demographics
  2. Make local changes (add/modify files)

  3. Scan and check:

    portolan scan .
    portolan check
  4. Push changes:

    portolan push s3://mybucket/my-catalog --collection demographics

Full Sync Workflow (Recommended)

For ongoing synchronization, use sync which orchestrates the full workflow:

# Single command: pull → init → scan → check → push
portolan sync s3://mybucket/my-catalog --collection demographics

# With auto-fix for cloud-native conversion
portolan sync s3://mybucket/my-catalog -c demographics --fix

Troubleshooting

Common Errors

"Not inside a Portolan catalog"

Error: Not inside a Portolan catalog (no catalog.json found)

Solution: Either:

  • Run portolan init to create a catalog
  • Navigate into an existing catalog directory
  • Use --portolan-dir to specify the catalog path

"Catalog already exists"

Error: Already a Portolan catalog at /path

Solution: The directory already has a catalog. If you want to reinitialize, remove catalog.json and .portolan/ first.

"Push conflict"

Error: Push conflict: remote has newer version

Solution: Either:

  • Run portolan pull first to get remote changes
  • Use --force to overwrite (careful: loses remote changes)

"Uncommitted changes"

Error: Pull blocked by uncommitted changes

Solution: Either:

  • Commit or push your local changes first
  • Use --force to discard local changes and pull anyway

File Format Issues

Shapefile Missing Components

Warning: Shapefiles require .shp, .shx, and .dbf files together.

Solution: Ensure all required sidecar files are present. portolan scan will detect incomplete shapefiles.

Non-Cloud-Native Files

Warning: Files like GeoJSON or Shapefiles aren't cloud-optimized.

Solution: Use portolan check --fix to convert:

  • Vectors → GeoParquet
  • COG (Cloud-Optimized GeoTIFF)

Getting JSON Output

All commands support --json or --format json for machine-readable output:

portolan scan . --json
portolan check --format json
portolan --format json init --auto

JSON output follows a consistent envelope format:

{
  "success": true,
  "command": "scan",
  "data": { ... },
  "errors": []
}