logo
icon

ArchiveBox

ArchiveBox is a self-hosted internet archiving tool. Save web pages, PDFs, screenshots, media, and more from URLs — stored locally with full-text search. Embedded SQLite database, no external dependencies.

template cover
Deployed0 times
Publisherfuturize.rush
Created2026-03-28
Services
service icon
Tags
ToolProductivity

ArchiveBox

A self-hosted internet archive. Save complete copies of web pages, PDFs, screenshots, DOM snapshots, media files, Git repos, and more from any URL.

What You Can Do After Deployment

Visit your domain to open the web UI. On first launch, ArchiveBox automatically initializes its data directory.

Create an admin account by setting the ADMIN_USERNAME and ADMIN_PASSWORD environment variables, or run archivebox manage createsuperuser via the console.

Add URLs to archive: Paste URLs into the web UI, use the REST API, or schedule imports from bookmarks, RSS feeds, Pocket, Pinboard, and browser history.

Each snapshot saves multiple formats: HTML, PDF, screenshot, WARC, DOM dump, Git clone, media files (via yt-dlp), and more. Browse archived content offline at any time.

Full-text search: Enabled by default with ripgrep. Search across all archived page content instantly.

Configuration

  • Admin credentials: Set ADMIN_USERNAME and ADMIN_PASSWORD environment variables
  • Search: SEARCH_BACKEND_ENGINE is set to ripgrep by default for fast full-text search
  • Media downloads: yt-dlp is included for archiving video/audio from supported sites
  • Data: All archives are stored in the /data volume

License

MIT — GitHub