Capture screenshots and extract text via OCR using MacPilot. Take full-screen, region, or window screenshots, and recognize text in images or screen areas with multi-language support.
Install
npx skillscat add adhikjoshi/macpilot-skills/macpilot-screenshot-ocr Install via the SkillsCat registry.
SKILL.md
MacPilot Screenshot & OCR
Use MacPilot to capture screenshots of the screen, specific regions, or application windows, and extract text from images or screen regions using Apple's built-in Vision OCR.
When to Use
Use this skill when:
- You need to capture what's currently on screen
- You need to extract text from an image file
- You need to read text from a specific area of the screen
- You need to capture a specific app window
- You need to verify visual state of an application
- You need to capture screen recordings
Screenshot Commands
Full Screen
macpilot screenshot --json # Capture to temp file
macpilot screenshot ~/Desktop/screen.png --json # Capture to specific pathSpecific Region
macpilot screenshot --region 100,200,800,600 --json
# Region format: x,y,width,height (from top-left corner)Specific Window
macpilot screenshot --window "Safari" --json # Capture Safari window
macpilot screenshot --window "Finder" --json # Capture Finder windowAll Windows
macpilot screenshot --all-windows --json # Each window separatelySpecific Display
macpilot screenshot --display 1 --json # Second display (0-indexed)Format Options
macpilot screenshot --format png ~/Desktop/shot.png # PNG (default, lossless)
macpilot screenshot --format jpg ~/Desktop/shot.jpg # JPEG (smaller files)OCR Commands
Extract Text from Image File
macpilot ocr /path/to/image.png --json
macpilot ocr ~/Desktop/screenshot.png --jsonExtract Text from Screen Region
macpilot ocr 100 200 800 600 --json
# Arguments: x y width height (captures region then OCRs it)Multi-Language OCR
macpilot ocr image.png --language en-US --json # English
macpilot ocr image.png --language ja --json # Japanese
macpilot ocr image.png --language zh-Hans --json # Simplified Chinese
macpilot ocr image.png --language de --json # German
macpilot ocr image.png --language fr --json # FrenchScreen Recording
Record Screen
macpilot screen record start --output ~/Desktop/recording.mov --json
# ... perform actions ...
macpilot screen record stop --jsonDisplay Information
macpilot display-info --json
# Returns: all displays with resolution, position, scale factorWorkflow Patterns
Capture and OCR in One Flow
# Take screenshot of specific region
macpilot screenshot --region 0,0,1920,1080 ~/tmp/capture.png --json
# Extract text from it
macpilot ocr ~/tmp/capture.png --jsonQuick Screen Region OCR
# Directly OCR a screen region without saving
macpilot ocr 200 100 600 400 --jsonVerify UI State
# Screenshot a window to see its current state
macpilot screenshot --window "Safari" ~/tmp/safari.png --json
# Read the image to verify content
macpilot ocr ~/tmp/safari.png --jsonRecord an Automation
macpilot screen record start --output ~/Desktop/demo.mov
macpilot app open Safari
macpilot wait seconds 2
macpilot keyboard key cmd+l
macpilot keyboard type "https://example.com"
macpilot keyboard key enter
macpilot wait seconds 3
macpilot screen record stopTips
- Screen Recording permission must be granted to MacPilot.app in System Settings
- PNG format is best for screenshots with text (lossless); JPEG for photos
- OCR works best on high-contrast text; increase screenshot region size if text is small
- Use
display-infoto get screen dimensions before capturing specific regions - The coordinate system starts at top-left (0,0) with x increasing right and y increasing down
- On Retina displays, coordinates are in logical points (not physical pixels)