Features — Pry · 43 Endpoints, Cloudflare Bypass, AI Schema Detection

Core Features

Built for serious scraping

Every feature you need — nothing you don't.

🛡️

Cloudflare Bypass

Built-in FlareSolverr integration. Scrape protected sites without getting blocked. Automatic challenge solving for JavaScript challenges, CAPTCHAs, and bot detection.

🌐

Browser Automation

Full Playwright support. Render JavaScript, click buttons, fill forms, capture screenshots. Real Chromium browser — any site that works in Chrome works in Pry.

🧠

AI Schema Detection

Point Pry at any page. It auto-detects product data, prices, articles, tables. LLM-powered with local model support via Ollama. No manual CSS selector writing.

📊

Data Pipeline

Output as JSON, CSV, or SQL. Pipe directly into PostgreSQL, ClickHouse, or your analytics stack. Structured extraction with typed schemas.

🔌

MCP Protocol

Native MCP server. Connect Claude, Cursor, or any MCP agent. Give AI the ability to scrape the web programmatically. The web becomes your agent's database.

⚡

Circuit Breaker

Auto-detects failing targets. Exponential backoff. Never get IP-banned by aggressive retry logic. Smart health checks and automatic recovery.

📸

Screenshots

Full-page or element-specific screenshots. PDF generation. Mobile viewport emulation. Perfect for visual regression testing and archiving.

🗺️

Sitemap Generation

Auto-discover all pages on a domain. Generate XML sitemaps. Crawl depth control. Respect robots.txt automatically.

🔍

Change Monitoring

Watch pages for changes. Get alerts when content updates. Diff two versions. Perfect for price tracking, competitor monitoring, and compliance.

API Endpoints

43 endpoints, all documented

Every endpoint with curl examples in the docs.

POST

/v1/scrape

Scrape any URL with optional Cloudflare bypass and JS rendering

POST

/v1/crawl

Crawl multiple pages with depth control and domain restrictions

POST

/v1/screenshot

Full-page or element-specific screenshots in PNG or PDF

POST

/v1/suggest

AI-powered schema detection — auto-detect data structures

POST

/v1/extract

Structured extraction with CSS selectors or AI schema

POST

/v1/pipe

Data pipeline — output JSON, CSV, or SQL directly to database

POST

/v1/share

Generate shareable result links with expiration

POST

/v1/batch

Batch process multiple URLs in parallel

POST

/v1/compare

Diff two pages and highlight changes

POST

/v1/watch

Monitor a page for changes with configurable intervals

POST

/v1/map

Generate sitemap for any domain with crawl depth control

POST

/v1/parse

Parse HTML and extract metadata, links, images, tables

POST

/v1/automate

Browser automation — clicks, forms, navigation sequences

GET

/v1/breaker/status

Check circuit breaker status for all targets

POST

/v1/breaker/reset

Reset circuit breaker for a specific target

GET

/health

Health check — returns service status and version

GET

/mcp/tools

List available MCP tools for AI agent integration

Quick Example

Scrape in one line

# Scrape with Cloudflare bypass and JS rendering
curl -X POST http://localhost:8005/v1/scrape \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","bypass_cloudflare":true,"render_js":true}'

# AI schema detection
curl -X POST http://localhost:8005/v1/suggest \
  -H "Content-Type: application/json" \
  -d '{"url":"https://shop.example.com"}'

# Full browser screenshot
curl -X POST http://localhost:8005/v1/screenshot \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com","full_page":true}' \
  --output screenshot.png

Everything Pry can do

Built for serious scraping

Cloudflare Bypass

Browser Automation

AI Schema Detection

Data Pipeline

MCP Protocol

Circuit Breaker

Screenshots

Sitemap Generation

Change Monitoring

43 endpoints, all documented

Scrape in one line

Ready to start scraping?