Crawl4AI

🚀🤖 Crawl4AI is an open-source, LLM-friendly web crawler and data scraper optimized for AI application scenarios. It supports multiple mainstream LLM APIs and provides powerful data extraction and processing capabilities, making it an ideal choice for building AI pipelines, RAG systems, and knowledge bases.

GitHub: https://github.com/unclecode/crawl4ai
Official Documentation: https://docs.crawl4ai.com/
Zeabur Environment Variable Guide: https://zeabur.com/docs/configuration/env

Core Features

📝 Markdown Generation

Generate clean, structured Markdown documents
AI-friendly content filtering with automatic noise removal
Smart citation management, converting links to numbered reference lists
Support for custom Markdown generation strategies
BM25 algorithm for core information extraction

📊 Structured Data Extraction

Support for all LLMs (open-source and proprietary) driven data extraction
Multiple chunking strategies: topic-based, regex-based, sentence-level
Semantic content retrieval based on cosine similarity
Fast CSS/XPath selector extraction
Custom schema definition for extracting structured JSON data

🌐 Browser Integration

Use user-owned browsers to completely avoid bot detection
Chrome DevTools Protocol support for remote control
Browser profile management with saved authentication states and cookies
Session management for multi-step crawling
Proxy support with authentication
Full browser control: modify headers, cookies, user agents, etc.
Compatible with Chromium, Firefox, and WebKit
Dynamic viewport adjustment for complete rendering

🔎 Crawling & Scraping

Media support: extract images, audio, videos, and responsive image formats
Dynamic content crawling: execute JS scripts and wait for async content
Page screenshot functionality for debugging and analysis
Support for raw HTML and local file processing
Comprehensive link extraction: internal, external links, and iframe content
Custom hooks for customizing crawling behavior at each step
Smart caching mechanism to improve speed and avoid redundant requests
Metadata extraction and seamless iframe content extraction
Lazy load handling and full-page scanning for infinite scroll pages

🚀 Deployment Features

Docker-optimized image with built-in FastAPI server
JWT token authentication for API security
One-click API gateway deployment
Scalable architecture for large-scale production environments
Cloud deployment ready configurations

Supported LLM APIs

OpenAI, Anthropic, Deepseek, Groq, Together, Mistral, Gemini

Quick Start

Fill in the required LLM API keys (optional, for AI-driven extraction features)
After deployment, visit /playground for the interactive crawler interface
Check the official documentation for more advanced usage

References

License

Apache-2.0

quickActions

Features

Resources

Community

Crawl4AI

Crawl4AI is an open-source, LLM-friendly web crawler and data scraper supporting multiple LLM API keys, designed for AI pipelines, RAG, and knowledge base construction.

Crawl4AI

Core Features

📝 Markdown Generation

📊 Structured Data Extraction

🌐 Browser Integration

🔎 Crawling & Scraping

🚀 Deployment Features

Supported LLM APIs

Quick Start

References

License

Services

Crawl4AI