Paperless-NGX


Overview

Overview

What is Paperless-NGX? (Live-Demo)

GitHub-logo Docker-logo

logo_full_white.png

 

DEMO

Introduction

Paperless-NGX is an open-source document management system designed to help users automate the process of digitizing, organizing, and archiving documents. By offering user-friendly interfaces, robust indexing/search capabilities, and smooth integration with popular container solutions, Paperless-NGX is a flexible choice for both personal and small-to-medium business use.


Architecture

Paperless-NGX typically runs as a set of containerized services that work together to ingest documents, perform Optical Character Recognition (OCR), and store data efficiently.


Features


Screenshots

documents-smallcards-dark.png 


More Information


Getting Started

Getting Started

Quick Paperless Stack Setup Guide

GitHub-logo Docker-logo logo_full_white.png

Paperless NGX is an open-source document management solution that allows you to digitize and efficiently manage your paperwork. In this guide, we will deploy Paperless NGX on a Docker Swarm cluster using a shared storage volume provided by GlusterFS (or a similar NAS-mounted setup) to ensure all nodes share the same data. If you intend to expose Paperless NGX to the internet, you can use Traefik as a reverse proxy for SSL termination.

Prerequisites

Step 1: Set Up Directory Structure

Create the directories for Paperless NGX data, ensuring they reside on your GlusterFS (or equivalent) mount so that data is shared among all Swarm nodes. For example:

mkdir -p /mnt/glustermount/data/paperless/
mkdir -p /mnt/glustermount/data/paperless/redisdata
mkdir -p /mnt/glustermount/data/paperless/data
mkdir -p /mnt/glustermount/data/paperless/media
mkdir -p /mnt/glustermount/data/paperless/export
mkdir -p /mnt/glustermount/data/paperless/consume
mkdir -p /mnt/glustermount/data/paperless/postgresqldata

Step 2: Create Your Docker Compose File

Important: In all configurations and code snippets below, replace YOUR-DOMAIN.com with your actual domain wherever applicable.

Below is an example docker-compose.yml that sets up Paperless NGX alongside Redis, PostgreSQL, Gotenberg, and Apache Tika. This file is intended for Docker Swarm with a GlusterFS-backed volume. You can adapt paths and replicas to your needs.

version: "3.7"

services:
  broker:
    image: docker.io/library/redis:7
    restart: unless-stopped
    volumes:
      - /mnt/glustermount/data/paperless/redisdata:/data
    deploy:
      mode: replicated
      replicas: 1
    networks:
      - internal

  webserver:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    restart: unless-stopped
    depends_on:
      - broker
      - gotenberg
      - tika
    ports:
      - "8000:8000"
    volumes:
      - /mnt/glustermount/data/paperless/data:/usr/src/paperless/data
      - /mnt/glustermount/data/paperless/media:/usr/src/paperless/media
      - /mnt/glustermount/data/paperless/export:/usr/src/paperless/export
      - /mnt/glustermount/data/paperless/consume:/usr/src/paperless/consume
    environment:
      PAPERLESS_REDIS: "redis://broker:6379"
      PAPERLESS_TIKA_ENABLED: 1
      PAPERLESS_TIKA_GOTENBERG_ENDPOINT: "http://gotenberg:3000"
      PAPERLESS_TIKA_ENDPOINT: "http://tika:9998"
      PAPERLESS_URL: "https://paperless.YOUR-DOMAIN.com"
      PAPERLESS_OCR_LANGUAGE: "eng"
      PAPERLESS_TIME_ZONE: "Europe/Zurich"
      PAPERLESS_ADMIN_USER: "${PAPERLESS_ADMIN_USER}"
      PAPERLESS_ADMIN_PASSWORD: "${PAPERLESS_ADMIN_PW}"
      PAPERLESS_ADMIN_MAIL: "${PAPERLESS_ADMIN_EMAIL}"
      PAPERLESS_SECRET_KEY: "${PAPERLESS_SECRET_KEY}"
      PAPERLESS_DBHOST: "db"
      PAPERLESS_DBNAME: "${PAPERLESS_POSTGRES_DB}"
      PAPERLESS_DBUSER: "${PAPERLESS_POSTGRES_USER}"
      PAPERLESS_DBPASS: "${PAPERLESS_POSTGRES_PASSWORD}"
    deploy:
      mode: replicated
      replicas: 1
      labels:
        - "traefik.enable=true"
        - "traefik.http.routers.webserver.rule=Host(`paperless.YOUR-DOMAIN.com`)"
        - "traefik.http.routers.webserver.entrypoints=websecure"
        - "traefik.http.services.webserver.loadbalancer.server.port=8000"
        - "traefik.docker.network=management_net"
    networks:
      - management_net
      - internal

  db:
    image: docker.io/library/postgres:16
    restart: unless-stopped
    volumes:
      - /mnt/glustermount/data/paperless/postgresqldata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: "${PAPERLESS_POSTGRES_DB}"
      POSTGRES_USER: "${PAPERLESS_POSTGRES_USER}"
      POSTGRES_PASSWORD: "${PAPERLESS_POSTGRES_PASSWORD}"
    deploy:
      mode: replicated
      replicas: 1
    networks:
      - internal

  gotenberg:
    image: docker.io/gotenberg/gotenberg:8.7
    restart: unless-stopped
    command:
      - "gotenberg"
      - "--chromium-disable-javascript=true"
      - "--chromium-allow-list=file:///tmp/.*"
    deploy:
      mode: replicated
      replicas: 1
    networks:
      - internal

  tika:
    image: docker.io/apache/tika:latest
    restart: unless-stopped
    deploy:
      mode: replicated
      replicas: 1
    networks:
      - internal

networks:
  management_net:
    external: true

  internal:
    driver: overlay
    ipam:
      config:
        - subnet: 172.16.58.0/24

Why We Define a Custom Subnet for the internal Network

The internal network is an overlay network dedicated to internal communication between Paperless NGX services (like Redis, Gotenberg, Tika, and PostgreSQL). By assigning a specific subnet (172.16.58.0/24), you ensure:

Defining Environment Variables

If you are using Portainer, you can define environment variables such as stack.env directly in the Portainer Web-GUI when deploying the stack.

If you are not using Portainer, create a .env file in the same directory as your docker-compose.yml and specify:

services:
  webserver:
    ...
    env_file:
      - .env

  db:
    ...
    env_file:
      - .env

Then add the following variables in your .env file:

PAPERLESS_POSTGRES_DB=
PAPERLESS_POSTGRES_USER=
PAPERLESS_POSTGRES_PASSWORD=
PAPERLESS_ADMIN_USER=
PAPERLESS_ADMIN_PW=
PAPERLESS_ADMIN_EMAIL=
PAPERLESS_SECRET_KEY=

You can also check out the official sample environment file for Paperless NGX to see additional variables you may configure.

Environment Variables Used in the Compose File

Below is a brief explanation of some key environment variables in the docker-compose.yml file. For a full list of available variables and their usage, refer to the Paperless NGX Configuration Documentation.

Step 3: Deploy the Stack

Navigate to the directory containing your docker-compose.yml file and deploy the stack with Docker Swarm:

docker stack deploy -c docker-compose.yml paperless

Alternatively, you can deploy the stack via Portainer or any other Docker Swarm management tool.

Step 4: Accessing Paperless NGX

Additional Notes

Conclusion

By deploying Paperless NGX on a Docker Swarm, you gain the benefits of high availability and scalability, especially when backed by a distributed storage solution like GlusterFS. Whether you use Portainer for an easier management interface or rely on .env files for more traditional Docker workflows, the key is consistent environment configuration and ensuring all nodes share the necessary data volumes. With this setup, your document management solution is primed for production use—secure, resilient, and easy to extend.

Configuration


Configuration

Tags, Document Types, Correspondent & more

GitHub-logo Docker-logo logo_full_white.png

Paperless-ngx is a wonderful tool to scan, classify, and organize your documents. In this article, we’ll discuss three important organizational elements: Document Types, Correspondent, and Tags. Along the way, we’ll ask guiding questions to help you figure out how best to categorize any piece of paperwork you might want to store in Paperless-ngx.


Document Types

Document Types refer to the broad category of the document in question. Is it a letter, a receipt, or a bill? You don’t need to overthink this category; just assign the document to a generalized type. For example, you might have a Receipts doctype for all the receipts you scan in, or even confirmations you receive after paying certain bills.

Correspondent

The Correspondent is the person or organization associated with the document. A credit card bill from Capital One would have “Capital One” as the correspondent. A W2 might have the IRS as the correspondent. Broadly defining your correspondent is key so you don’t complicate future searches with overly specific labels.

Tags

Tags let you categorize documents by answering basic questions like who, what, and when the document references. They can also be used for special categories or important groups of documents.

OCR Considerations

Optical Character Recognition (OCR) is undoubtedly helpful for searching within the text of scanned documents. However, it shouldn’t be your only search strategy. Combining OCR with at least 1–2 well-chosen metadata fields (like Document Type or Correspondent) plus relevant Tags can make finding a specific document much easier—especially when you have years and years of paperwork.

Garbage In, Garbage Out

Like with any data system, the quality of your searches in Paperless-ngx is only as good as the data you choose to include. Spend a little extra time specifying at least one metadata field and adding a couple of relevant tags. This way, when you need to find an important document, you can rely on your carefully curated system to do the work for you.

In summary, Document Types, Correspondent, and Tags form a powerful trifecta in Paperless-ngx to keep your records neat and easily searchable. Leverage OCR, but don’t depend on it alone. And remember: the small effort to add good data up front will pay big dividends when you need to retrieve those documents later.

Configuration

SMTP Setup

GitHub-logo Docker-logo logo_full_white.png

Setting up an SMTP server for the backend in Paperless-ngx allows you to send emails directly from the system, most commonly for password reset purposes. These environment variables closely mirror the corresponding Django email settings, ensuring easy configuration.

Environment Variables

To configure these in a Docker environment, simply add them to your docker-compose.yml under the environment section of the paperless-ngx service. For example:

services:
  paperless-ngx:
    image: ghcr.io/paperless-ngx/paperless-ngx:latest
    environment:
      - PAPERLESS_EMAIL_HOST=smtp.yourprovider.com
      - PAPERLESS_EMAIL_PORT=587
      - PAPERLESS_EMAIL_HOST_USER=youremail@provider.com
      - PAPERLESS_EMAIL_HOST_PASSWORD=supersecretpassword
      - PAPERLESS_EMAIL_FROM=youremail@provider.com
      - PAPERLESS_EMAIL_USE_TLS=true

Once set, Paperless-ngx will use these SMTP settings to send necessary notifications, such as password reset emails. Adjust values as needed based on your email provider’s requirements.

It’s generally best practice to use TLS or SSL for secure email communication. Make sure you enable the correct protocol flags (PAPERLESS_EMAIL_USE_TLS or PAPERLESS_EMAIL_USE_SSL) for your provider.