Create PDF documents from PNG files

Create PDF documents from scanned documents

Preface

There are times when I am asked to scan an important paper document so it can be sent to an authority, be it local or government. I have at my disposal a Canon Lide 600F scanner, my Ubuntu Linux desktop, and gumption. This article details how I create a PDF document from the paper.

Overview

The overall process is thus:

  1. install the tools
  2. scan each page, saving as numerically indexed PNG file
  3. convert the PNGs to JPEG
  4. assemble the PDF document

Install the tools

We need to install ImageMagick and skanlite. On Ubuntu 24.04 (Noble Numbat), the command to invoke is:

sudo apt install imagemagick skanlite

Scan the pages

When scanning a monochromatic paper document without images, I tend to set skanlite to scan at 150DPI in greyscale. This is low enough resolution to save storage and high enough to be clearly legible. I save my pages numerically in order e.g.

This naming scheme is important as the subsequent steps rely on globbing.

Convert the PNG files to JPEGs

The command to convert a PNG file to a JPEG is:

convert 0001.png 0001.jpg

I downgrade the quality because the resulting document will be sent via email, so I adjust the conversion accordingly:

convert 0001.png \
    -quality 80 \
    0001.jpg

I store my JPEGs in a subdirectory to be tidy (and so I may wipe them out in one fell swoop).

The command to "stitch" the JPEGs together is:

convert \
    -quality 50 \
    0001.jpg \
    0002.jpg \
    ... \
    0020.jpg \
    render.pdf

The final script

The final script I that use is:

#!/bin/bash

# Create the subdirectory for rendered JPEGs
mkdir -p jpg || exit

# Glob the png files and convert them
for png_name in *.png
do
    # get the filename from the full-path
    name="$(basename "${png_name}")"

    # replace the suffix with .jpg
    new_name="${name%.png}.jpg"

    # Inform the use of which file is being processed
    echo "- ${name} -> ${new_name}"

    # Convert the PNG to JGP
    convert "${png_name}" \
        -quality 80 \
        "jpg/${new_name}" || exit
done

# Render the PDF from the JPEGs
convert \
    -quality 50 \
    jpg/* \
    render.pdf || exit

echo 'done'

exit 0

In fact, I actually add time to print the time elapsed running each step, but that's entirely optional.

The output for a document looks like this (without times):

- 0001.png -> 0001.jpg
- 0002.png -> 0002.jpg
- 0003.png -> 0003.jpg
- 0004.png -> 0004.jpg
- 0005.png -> 0005.jpg
- 0006.png -> 0006.jpg
- 0007.png -> 0007.jpg
- 0008.png -> 0008.jpg
- 0009.png -> 0009.jpg
- 0010.png -> 0010.jpg
- 0011.png -> 0011.jpg
- 0012.png -> 0012.jpg
- 0013.png -> 0013.jpg
- 0014.png -> 0014.jpg
- 0015.png -> 0015.jpg
- 0016.png -> 0016.jpg
done

Potential Enhancements

There are some changes I could make to improve the process:

This is all starting to sound very reminiscent of Makefiles!

These steps would save time if I had to rescan any pages, but the cost-benefit ratio places the changes low down on my list of things to attend to.

Round-up

This process proves to be useful several times a year, especially when having to send digital copies of important paperwork to multiple authorities on short deadlines.

This morning I scanned two 18 page documents, each resulting in a 2.2MB PDF which should traverse the e-mail systems without causing a size-limit rejection. Yes, I could upload the file to a secure location and provide a password protected download URL, but not everyone understands or trusts that kind of mechanism. Sometimes, it's easier to stick to what the recipient knows best - PDFs and e-mail.

Resources