
paperless-ngx
Paperless-ngx is a document management system for digitizing and managing paper documents. It OCR-scans documents for text recognition, automatically classifies and tags them, and builds a full-text searchable archive.
- OCR processing via Tesseract-based text recognition for searchable PDF generation
- Auto-classification using machine learning for document categorization and tagging
- Full-text search across all OCR-processed documents
- Email ingestion for automatic import of PDF attachments from email
- Barcode splitting for automatic multi-document separation using barcode separator sheets
- Workflows for document assignment, approval, and consumption management
Built with Python on Django, Paperless-ngx is Docker self-hosted. It is the definitive tool for going paperless by converting paper documents into searchable digital archives.
Stars
37,918
Forks
2,413
Language
Python
License
GPL-3.0
angulararchivingdjangodmsdocument-managementdocument-management-systemmachine-learningocroptical-character-recognitionpdf