A self-hosted internet archive. Save complete copies of web pages, PDFs, screenshots, DOM snapshots, media files, Git repos, and more from any URL.
Visit your domain to open the web UI. On first launch, ArchiveBox automatically initializes its data directory.
Create an admin account by setting the ADMIN_USERNAME and ADMIN_PASSWORD environment variables, or run archivebox manage createsuperuser via the console.
Add URLs to archive: Paste URLs into the web UI, use the REST API, or schedule imports from bookmarks, RSS feeds, Pocket, Pinboard, and browser history.
Each snapshot saves multiple formats: HTML, PDF, screenshot, WARC, DOM dump, Git clone, media files (via yt-dlp), and more. Browse archived content offline at any time.
Full-text search: Enabled by default with ripgrep. Search across all archived page content instantly.
ADMIN_USERNAME and ADMIN_PASSWORD environment variablesSEARCH_BACKEND_ENGINE is set to ripgrep by default for fast full-text search/data volumeMIT — GitHub