PortfolioServices
← All projects
Dev Tools/2025

Scraper SDK

Describe what data you need. Claude writes, tests, and runs the scraper in real time.

Visit project →
Scraper SDK

Web scraping is broken by design. Traditional scrapers hardcode CSS selectors that break the moment a site updates its layout. Sites with bot detection block headless browsers. Large scrapes crash mid-run and lose everything. Building a scraper for a new site takes hours of inspecting page structure, handling pagination edge cases, and writing retry logic. Then the site changes and you start over. The tooling assumed a world where websites stayed static. They do not.

Scout and Build

Two-phase workflow. The scout phase loads the target URL, snapshots the page structure, identifies data APIs, figures out pagination method (infinite scroll, URL parameters, click-to-load), and maps detail page paths. All in under 10 tool calls. No hardcoded selectors. The AI reads and adapts. The build phase generates a Python Playwright script based on the scout findings, tests on 2-3 items to verify accuracy, then scales up. Large scrapes running 50,000+ items stream every result to disk immediately. A crash mid-run resumes from the last checkpoint instead of starting over.

Scout and Build

Stealth at Scale

Five parallel browser tabs extract detail pages simultaneously. Eight rotating fingerprints across Chrome, Firefox, Safari, and Edge bypass PerimeterX and Akamai bot detection. Residential proxy rotation on every request. Three authentication modes: LiteLLM Proxy for enterprise routing across multiple AI providers, direct Anthropic API key for simple setups, and Claude SDK OAuth for Pro and Max subscribers. Model selection between Sonnet, Opus, and Haiku depending on task complexity. The chat interface streams Claude's reasoning in real time so you see exactly what the agent is thinking as it builds your scraper.

Stealth at Scale

Built on Claude Agent SDK with subagents for specialized research and code review. Session persistence lets you resume conversations. Output exports to CSV and XLSX. The frontend runs React with Vite. The backend runs FastAPI with Uvicorn. Everything containerized in Docker Compose. When a site redesigns, the scraper adapts because it reads the page structure, not memorized selectors.

Stack
PythonFastAPIClaude SDKPlaywrightBeautifulSoupDocker
Solo EngineerSepehr Moghaddam

Have something like this?

Let's talk