paperless-ngx

paperless-ngx

Paperless-ngx is a document management system for digitizing and managing paper documents. It OCR-scans documents for text recognition, automatically classifies and tags them, and builds a full-text searchable archive.

  • OCR processing via Tesseract-based text recognition for searchable PDF generation
  • Auto-classification using machine learning for document categorization and tagging
  • Full-text search across all OCR-processed documents
  • Email ingestion for automatic import of PDF attachments from email
  • Barcode splitting for automatic multi-document separation using barcode separator sheets
  • Workflows for document assignment, approval, and consumption management

Built with Python on Django, Paperless-ngx is Docker self-hosted. It is the definitive tool for going paperless by converting paper documents into searchable digital archives.

Stars
37,918
Forks
2,413
Language
Python
License
GPL-3.0
angulararchivingdjangodmsdocument-managementdocument-management-systemmachine-learningocroptical-character-recognitionpdf