apify-scrapers

Social media and web scraping using Apify actors. Use this skill when scraping Twitter/X tweets, Reddit posts, LinkedIn posts, Instagram profiles/posts/reels, Facebook pages/posts/groups, TikTok videos, YouTube content, Google Maps businesses/reviews, contact enrichment (emails/phones from websites), or when auto-detecting URL type to scrape. Triggers on requests to scrape social media, get trending posts, extract business info, find contact details, or extract content from social URLs.

Casper-Studios 11 3 Updated 5mo ago

Resources

GitHub

Install

npx skillscat add casper-studios/casper-marketplace/apify-scrapers

Install via the SkillsCat registry.

SKILL.md

Apify Scrapers

Overview

Scrape content from major social platforms using Apify actors. Each platform has optimized settings for cost and quality.

Quick Decision Tree

What do you want to scrape?
│
├── Social Media Posts
│   ├── Twitter/X → references/twitter.md
│   │   └── Script: scripts/scrape_twitter_ai_trends.py
│   │
│   ├── Reddit → references/reddit.md
│   │   └── Script: scripts/scrape_reddit_ai_tech.py
│   │
│   ├── LinkedIn → references/linkedin.md
│   │   └── Script: scripts/scrape_linkedin_posts.py
│   │
│   ├── Instagram → references/instagram.md
│   │   └── Script: scripts/scrape_instagram.py
│   │   └── Modes: profile, posts, hashtag, reels, comments
│   │
│   ├── Facebook → references/facebook.md
│   │   └── Script: scripts/scrape_facebook.py
│   │   └── Modes: page, posts, reviews, groups, marketplace
│   │
│   ├── TikTok → references/multi-platform.md
│   │   └── Script: scripts/scrape_multi_platform.py
│   │
│   └── YouTube → references/multi-platform.md
│       └── Script: scripts/scrape_multi_platform.py
│
├── Business/Places
│   ├── Google Maps businesses → references/google-maps.md
│   │   └── Script: scripts/scrape_google_maps.py
│   │   └── Modes: search, place, reviews
│   │
│   └── Contact info from websites → references/contact-enrichment.md
│       └── Script: scripts/scrape_contact_info.py
│       └── Extract: emails, phone numbers, social profiles
│
├── Auto-detect URL type → references/url-detect.md
│   └── Script: scripts/scrape_content_by_url.py
│
├── Trend Analysis (NEW)
│   └── Enriched trend analysis → workflows/trend-analysis.md
│       └── Script: scripts/analyze_trends.py
│       └── Features: velocity scoring, lifecycle staging, opportunity scoring
│
└── Workflows (multi-step)
    ├── Lead generation → workflows/lead-generation.md
    ├── Influencer discovery → workflows/influencer-discovery.md
    ├── Competitor analysis → workflows/competitor-intel.md
    ├── Trend analysis → workflows/trend-analysis.md
    └── Competitor Ads Intelligence (NEW) → workflows/competitor-ads.md
        └── Script: scripts/scrape_competitor_ads.py
        └── Platforms: Facebook Ads Library, Google Ads Transparency
        └── Features: Spend estimates, creative analysis, benchmarking

Environment Setup

# Required in .env
APIFY_TOKEN=apify_api_xxxxx

Get your API key: https://console.apify.com/account/integrations

Common Usage Patterns

Scrape Twitter Trends

python scripts/scrape_twitter_ai_trends.py --query "AI agents" --max-tweets 50

Scrape Reddit Discussions

python scripts/scrape_reddit_ai_tech.py --subreddits "MachineLearning,LocalLLaMA" --max-posts 100

Scrape LinkedIn Author

python scripts/scrape_linkedin_posts.py author "https://linkedin.com/in/username" --max-posts 30

Auto-detect and Scrape URL

python scripts/scrape_content_by_url.py "https://x.com/user/status/123456"

Scrape Instagram Profile

python scripts/scrape_instagram.py profile "https://instagram.com/username" --max-posts 20

Scrape Instagram Hashtag

python scripts/scrape_instagram.py hashtag "#artificialintelligence" --max-posts 50

Scrape Instagram Reels

python scripts/scrape_instagram.py reels "https://instagram.com/username" --max-reels 30

Scrape Facebook Page

python scripts/scrape_facebook.py page "https://facebook.com/pagename" --max-posts 50

Scrape Facebook Reviews

python scripts/scrape_facebook.py reviews "https://facebook.com/pagename" --max-reviews 100

Scrape Facebook Marketplace

python scripts/scrape_facebook.py marketplace "laptops in san francisco" --max-items 30

Scrape Google Maps Businesses

python scripts/scrape_google_maps.py search "AI consulting firms in New York" --max-results 50

Scrape Google Maps Reviews

python scripts/scrape_google_maps.py reviews "ChIJN1t_tDeuEmsRUsoyG83frY4" --max-reviews 100

Extract Contact Info from Websites

python scripts/scrape_contact_info.py "https://example.com" --depth 2

Bulk Contact Enrichment

python scripts/scrape_contact_info.py --urls-file companies.txt --output contacts.json

Scrape Competitor Ads (Single Competitor)

python scripts/scrape_competitor_ads.py "Nike" --platforms facebook google --country US --days 30

Compare Multiple Competitors' Ads

python scripts/scrape_competitor_ads.py "Nike" "Adidas" "Puma" --compare --output comparison.json

Discover Advertisers by Keyword

python scripts/scrape_competitor_ads.py --search "running shoes" --country US --max-ads 200

Filter Competitor Ads by Media Type

python scripts/scrape_competitor_ads.py "Netflix" "Disney+" --platforms facebook --media-types video --days 7

Analyze Trends (NEW)

# Analyze specific topic with enrichments
python scripts/analyze_trends.py "artificial intelligence" --sources google instagram tiktok --days 90

# Discover trending topics in category
python scripts/analyze_trends.py --category technology --discover --top 50

# Compare multiple trends
python scripts/analyze_trends.py "AI" "blockchain" "metaverse" --compare

# Export HTML trend report
python scripts/analyze_trends.py "sustainable fashion" --format html --output trend_report.html

Cost Estimates

Platform	Actor	Cost per Item
Twitter	kaitoeasyapi/twitter-x-data-tweet-scraper	~$0.00025
Reddit	trudax/reddit-scraper	~$0.001-0.005
LinkedIn	harvestapi/linkedin-post-search	~$0.01-0.05
YouTube	streamers/youtube-scraper	~$0.01-0.05
TikTok	clockworks/tiktok-scraper	~$0.005
Instagram (profile)	apify/instagram-profile-scraper	~$0.005
Instagram (posts)	apify/instagram-post-scraper	~$0.002-0.005
Instagram (hashtag)	apify/instagram-hashtag-scraper	~$0.002-0.005
Instagram (reels)	apify/instagram-reel-scraper	~$0.005-0.01
Instagram (comments)	apify/instagram-comment-scraper	~$0.001-0.003
Facebook (page)	apify/facebook-pages-scraper	~$0.005-0.01
Facebook (posts)	apify/facebook-posts-scraper	~$0.003-0.005
Facebook (reviews)	apify/facebook-reviews-scraper	~$0.002-0.005
Facebook (groups)	apify/facebook-groups-scraper	~$0.005-0.01
Facebook (marketplace)	apify/facebook-marketplace-scraper	~$0.005-0.01
Google Maps (search)	compass/crawler-google-places	~$0.01-0.02
Google Maps (place)	compass/google-maps-business-scraper	~$0.01
Google Maps (reviews)	compass/google-maps-reviews-scraper	~$0.003-0.005
Contact Enrichment	lukaskrivka/contact-info-scraper	~$0.01-0.03
Google Trends	apify/google-trends-scraper	~$0.01
Trend Analysis (multi)	Multiple actors	~$0.50-1.50/run
Facebook Ads Library	apify/facebook-ads-scraper	~$0.75/1K ads
Facebook Ads (alt)	curious_coder/facebook-ads-library-scraper	~$0.50/1K ads
Google Ads Transparency	lexis-solutions/google-ads-scraper	~$1.00/1K ads
Google Ads (alt)	xtech/google-ad-transparency-scraper	~$0.80/1K ads

Output Location

All scraped data saves to .tmp/ with timestamped filenames:

.tmp/twitter_ai_trends_YYYYMMDD.json
.tmp/reddit_ai_tech_YYYYMMDD.json
.tmp/linkedin_posts_YYYYMMDD_HHMMSS.json

Security Notes

Credential Handling

Store APIFY_TOKEN in .env file (never commit to git)
Rotate API tokens periodically via Apify Console
Never log or print API tokens in script output
Use environment variables, not hardcoded values

Data Privacy

Scraped data contains only publicly available content
Social media posts may include PII (names, handles, profile info)
Data is stored locally in .tmp/ directory
No data is retained by Apify after actor run completes
Consider data minimization - only scrape what you need

Access Scopes

Apify tokens have full account access (no granular scopes)
Use separate Apify accounts for different projects if needed
Monitor usage via Apify Console dashboard

Compliance Considerations

Terms of Service: Respect each platform's ToS (Twitter, Reddit, LinkedIn)
Rate Limiting: Actors have built-in rate limiting to avoid bans
Robots.txt: Some actors may bypass robots.txt - use responsibly
GDPR: Scraped PII may be subject to GDPR if EU residents
Ethical Use: Only scrape public data; never bypass authentication
Proxy Ethics: Residential proxies should be used ethically

Troubleshooting

Common Issues

Issue: Actor run failed

Symptoms: Script terminates with "Actor run failed" or timeout error
Cause: Invalid actor ID, insufficient proxy credits, or actor configuration issue
Solution:

Verify the actor ID is correct in the script
Check Apify Console for actor run logs
Ensure proxy settings match actor requirements
Try running with default proxy settings first

Issue: Empty results returned

Symptoms: Script completes but returns 0 items
Cause: Content blocked by platform, invalid query, or proxy being detected
Solution:

Try a different proxy type (residential vs datacenter)
Simplify the search query
Reduce the number of results requested
Check if the platform is blocking scraping attempts

Issue: Rate limited by platform

Symptoms: Script fails with 429 errors or "rate limited" messages
Cause: Too many requests in a short time period
Solution:

Add delays between requests (actor settings)
Reduce concurrent requests
Use proxy rotation
Wait and retry after a cooldown period

Issue: Invalid API token

Symptoms: Authentication error or "invalid token" message
Cause: Token expired, revoked, or incorrectly set
Solution:

Regenerate API token in Apify Console
Verify token is correctly set in .env file
Check for leading/trailing whitespace in token
Ensure APIFY_TOKEN environment variable is loaded

Issue: Proxy connection errors

Symptoms: Connection timeout or proxy errors
Cause: Proxy pool exhausted or geo-restriction issues
Solution:

Switch proxy type (basic, residential, or datacenter)
Verify proxy credit balance in Apify Console
Try a different proxy country/region
Disable proxy to test if that's the root cause

Resources

Platform References

references/twitter.md - Twitter/X scraping details
references/reddit.md - Reddit scraping with subreddit targeting
references/linkedin.md - LinkedIn post scraping (author or search mode)
references/instagram.md - Instagram profile, posts, hashtag, reels, and comments scraping
references/facebook.md - Facebook page, posts, reviews, groups, and marketplace scraping
references/multi-platform.md - TikTok and YouTube scraping
references/url-detect.md - Auto-detect URL type and scrape

Business/Places References

references/google-maps.md - Google Maps business search, place details, and reviews
references/contact-enrichment.md - Extract emails, phone numbers, and social profiles from websites

Workflow References

workflows/lead-generation.md - Multi-step lead generation workflow
workflows/influencer-discovery.md - Find and analyze influencers across platforms
workflows/competitor-intel.md - Competitive intelligence gathering workflow
workflows/trend-analysis.md - Enriched multi-platform trend analysis with scoring

Integration Patterns

Scrape and Enrich

Skills: apify-scrapers → parallel-research
Use case: Scrape social media posts, then enrich with deep research
Flow:

Scrape Twitter/Reddit for mentions of a topic
Extract company names or URLs from posts
Use parallel-research to get detailed info on each company

Scrape and Summarize

Skills: apify-scrapers → content-generation
Use case: Create newsletter content from social media trends
Flow:

Scrape trending AI posts from Twitter
Pass scraped data to content-generation summarize
Generate a formatted newsletter section

Scrape and Archive

Skills: apify-scrapers → google-workspace
Use case: Save scraped data to Google Drive for team access
Flow:

Scrape LinkedIn posts from target accounts
Format data as CSV or JSON
Upload to Google Drive client folder via google-workspace

Trend Analysis + Content Strategy

Skills: apify-scrapers (trend-analysis) → content-generation
Use case: Identify trending topics and create content strategy
Flow:

Run trend analysis: python scripts/analyze_trends.py "AI productivity" --sources all
Review lifecycle stage and opportunity score
Use content-generation to create content for high-opportunity trends
Focus on emerging trends with high velocity scores

Competitive Trend Monitoring

Skills: apify-scrapers (trend-analysis) → parallel-research
Use case: Monitor competitor visibility in trending topics
Flow:

Analyze industry trends: python scripts/analyze_trends.py --category "your-industry" --discover
Compare your brand vs competitors in those trends
Use parallel-research for deep dive on gaps
Generate competitive intelligence report

apify-scrapers

Resources

Install

Apify Scrapers

Overview

Quick Decision Tree

Environment Setup

Common Usage Patterns

Scrape Twitter Trends

Scrape Reddit Discussions

Scrape LinkedIn Author

Auto-detect and Scrape URL

Scrape Instagram Profile

Scrape Instagram Hashtag

Scrape Instagram Reels

Scrape Facebook Page

Scrape Facebook Reviews

Scrape Facebook Marketplace

Scrape Google Maps Businesses

Scrape Google Maps Reviews

Extract Contact Info from Websites

Bulk Contact Enrichment

Scrape Competitor Ads (Single Competitor)

Compare Multiple Competitors' Ads

Discover Advertisers by Keyword

Filter Competitor Ads by Media Type

Analyze Trends (NEW)

Cost Estimates

Output Location

Security Notes

Credential Handling

Data Privacy

Access Scopes

Compliance Considerations

Troubleshooting

Common Issues

Issue: Actor run failed

Issue: Empty results returned

Issue: Rate limited by platform

Issue: Invalid API token

Issue: Proxy connection errors

Resources

Platform References

Business/Places References

Workflow References

Integration Patterns

Scrape and Enrich

Scrape and Summarize

Scrape and Archive

Trend Analysis + Content Strategy

Competitive Trend Monitoring

Categories

Install

Recommended Skills