Pry API Reference
Complete reference for the Pry web scraping engine. All endpoints return JSON. Base URL: http://localhost:8005
Quickstart
Pry runs as a Docker Compose stack. After deploying, the API is available on port 8005.
# Health check curl http://localhost:8005/health # Scrape a page curl -X POST http://localhost:8005/v1/scrape \ -H "Content-Type: application/json" \ -d '{"url":"https://example.com"}'
Authentication
By default, Pry does not require authentication — it's designed to run on your internal network behind a firewall. For public-facing deployments, you can enable API key authentication via the config endpoint or environment variable.
# Set an API key curl -X POST http://localhost:8005/v1/config \ -H "Content-Type: application/json" \ -d '{"api_key":"your-secret-key"}' # Then include it in requests curl -X POST http://localhost:8005/v1/scrape \ -H "Content-Type: application/json" \ -H "X-API-Key: your-secret-key" \ -d '{"url":"https://example.com"}'
Rate Limiting
Pry includes built-in rate limiting to prevent abuse. Default limits: 120 requests per minute per IP, with a burst allowance of 200. Configure via /v1/config.
# Check current rate limit config curl http://localhost:8005/v1/config # Response {"rate_limit":{"rpm":120,"burst":200}}
Core Endpoints
The primary endpoint. Fetches and extracts content from any URL. Supports Cloudflare bypass, JavaScript rendering, and multiple output formats.
Request body:
{ "url": "https://example.com", "bypass_cloudflare": true, // Use FlareSolverr "render_js": true, // Use Playwright browser "timeout": 30, // Seconds "format": "markdown" // "markdown" | "html" | "text" }
Response:
{ "success": true, "data": { "url": "https://example.com", "title": "Example Domain", "content": "...", "status_code": 200, "method": "flaresolverr" } }
Crawl a site starting from a URL. Specify depth and max pages. Returns all discovered pages.
{ "url": "https://example.com", "max_pages": 10, "max_depth": 2, "same_domain": true }
Take a full-page or viewport screenshot of any URL using Playwright.
{ "url": "https://example.com", "full_page": true, "width": 1920, "height": 1080, "format": "png" // "png" | "jpeg" }
Analyze a page and auto-detect its data structure. Returns CSS selectors for products, prices, names, descriptions, ratings, and stock levels. Uses local LLM (Ollama) for intelligent detection.
{"url": "https://shop.example.com"}
Response:
{ "success": true, "data": { "_page_title": "Shop", "suggested": { "price": "[class*='price']", "name": "h1.product-title", "image": "[class*='product-image']", "rating": "[class*='review-score']" } } }
Extract specific fields from a page using CSS selectors. Combine with /v1/suggest to auto-detect selectors first.
{ "url": "https://shop.example.com", "fields": { "price": "[class*='price']", "name": "h1.title", "image": "img.product-image" } }
Scrape a URL and transform the output into JSON, CSV, or SQL format. Pipe directly into your database or analytics stack.
{ "url": "https://example.com", "transform": "json" // "json" | "csv" | "sql" }
Advanced Endpoints
Scrape multiple URLs in a single request. Results returned as an array.
{"urls": ["https://example.com", "https://example.org"], "bypass_cloudflare": true}
Scrape two URLs and return a diff of their content. Useful for monitoring competitor changes.
{"url_a": "https://example.com/v1", "url_b": "https://example.com/v2"}
Register a URL to monitor. Pry will periodically check for changes and report diffs.
{"url": "https://example.com", "interval": "1h", "webhook": "https://your-server.com/hook"}
Crawl a domain and generate a sitemap of all discovered URLs.
{"url": "https://example.com", "max_pages": 500}
Parse and extract content from raw HTML input. Useful when you already have the HTML from another source.
{"html": "<html>...</html>", "format": "markdown"}
Execute browser automation steps using Playwright. Navigate, click, type, wait for selectors, then extract.
{ "url": "https://example.com/login", "steps": [ {"action": "type", "selector": "#email", "value": "[email protected]"}, {"action": "type", "selector": "#password", "value": "pass123"}, {"action": "click", "selector": "button[type=submit]"}, {"action": "wait", "selector": ".dashboard"}, {"action": "extract"} ] }
System Endpoints
View the circuit breaker state for all tracked domains. Shows failure counts and backoff timers.
# Check breaker status curl http://localhost:8005/v1/breaker/status # Reset breaker for a domain or all domains curl -X POST http://localhost:8005/v1/breaker/reset \ -H "Content-Type: application/json" \ -d '{"domain":"example.com"}'
The circuit breaker prevents aggressive retries against failing targets. After N consecutive failures, Pry backs off exponentially (max 60 seconds). This prevents IP bans.
Returns service health, version, cache stats, rate limiter status, and enabled features.
{ "status": "ok", "service": "pry", "version": "3.0.0", "backends": {"flaresolverr":true,"playwright":true} }
List all available MCP (Model Context Protocol) tools. Connect any MCP-compatible AI agent (Claude Desktop, Cursor) to give it web scraping capabilities.
curl http://localhost:8005/mcp/tools
Deployment
Pry ships as a Docker Compose project. Minimum requirements: Linux server with Docker, 2GB+ RAM.
# 1. Extract the download tar xzf pry-3.0.0.tar.gz && cd pry # 2. Start the stack docker compose up -d # 3. Verify curl http://localhost:8005/health # 4. (Optional) Enable Tor support docker compose --profile tor up -d
What's included in the stack:
| Service | Port | Description |
|---|---|---|
| munchcrawl | 8005 | Pry API server (FastAPI) |
| flaresolverr | 8191 | Cloudflare bypass service |
| playwright | — | Headless Chromium (bundled) |
| tor (optional) | 9050 | Tor SOCKS5 proxy |
Error Codes
| Status | Meaning | Action |
|---|---|---|
| 200 | Success | — |
| 400 | Bad request — missing URL or invalid params | Check your request body |
| 429 | Rate limit exceeded | Wait and retry; increase burst in config |
| 500 | Scrape failed — target unreachable or blocked | Try bypass_cloudflare:true or render_js:true |
| 502 | FlareSolverr down | Restart flaresolverr container |
| 503 | Circuit breaker open for domain | Check /v1/breaker/status; wait for backoff |
← Back to Pry · Need help? [email protected]