Preface
There are times when I am asked to scan an important paper document so it can be sent to an authority, be it local or government. I have at my disposal a Canon Lide 600F scanner, my Ubuntu Linux desktop, and gumption. This article details how I create a PDF document from the paper.
Overview
The overall process is thus:
- install the tools
- scan each page, saving as numerically indexed PNG file
- convert the PNGs to JPEG
- assemble the PDF document
Install the tools
We need to install ImageMagick and skanlite. On Ubuntu 24.04 (Noble Numbat), the command to invoke is:
sudo apt install imagemagick skanlite
Scan the pages
When scanning a monochromatic paper document without images, I tend to set skanlite to scan at 150DPI in greyscale. This is low enough resolution to save storage and high enough to be clearly legible. I save my pages numerically in order e.g.
- 0001.png
- 0002.png
- 0003.png
- ...
- 0020.png
This naming scheme is important as the subsequent steps rely on globbing.
Convert the PNG files to JPEGs
The command to convert a PNG file to a JPEG is:
convert 0001.png 0001.jpg
I downgrade the quality because the resulting document will be sent via email, so I adjust the conversion accordingly:
convert 0001.png \
-quality 80 \
0001.jpg
I store my JPEGs in a subdirectory to be tidy (and so I may wipe them out in one fell swoop).
The command to "stitch" the JPEGs together is:
convert \
-quality 50 \
0001.jpg \
0002.jpg \
... \
0020.jpg \
render.pdf
The final script
The final script I that use is:
#!/bin/bash
# Create the subdirectory for rendered JPEGs
mkdir -p jpg || exit
# Glob the png files and convert them
for png_name in *.png
do
# get the filename from the full-path
name="$(basename "${png_name}")"
# replace the suffix with .jpg
new_name="${name%.png}.jpg"
# Inform the use of which file is being processed
echo "- ${name} -> ${new_name}"
# Convert the PNG to JGP
convert "${png_name}" \
-quality 80 \
"jpg/${new_name}" || exit
done
# Render the PDF from the JPEGs
convert \
-quality 50 \
jpg/* \
render.pdf || exit
echo 'done'
exit 0
In fact, I actually add time
to print the time elapsed running each step, but that's entirely optional.
The output for a document looks like this (without times):
- 0001.png -> 0001.jpg
- 0002.png -> 0002.jpg
- 0003.png -> 0003.jpg
- 0004.png -> 0004.jpg
- 0005.png -> 0005.jpg
- 0006.png -> 0006.jpg
- 0007.png -> 0007.jpg
- 0008.png -> 0008.jpg
- 0009.png -> 0009.jpg
- 0010.png -> 0010.jpg
- 0011.png -> 0011.jpg
- 0012.png -> 0012.jpg
- 0013.png -> 0013.jpg
- 0014.png -> 0014.jpg
- 0015.png -> 0015.jpg
- 0016.png -> 0016.jpg
done
Potential Enhancements
There are some changes I could make to improve the process:
- create the JPEG if one doesn't exist already
- overwrite an existing JPEG only if the timestamp on the PNG file is newer than that of the existing JPEG
- conditionally render the PDF if any of the inputs are newer than the existing render
This is all starting to sound very reminiscent of Makefiles!
These steps would save time if I had to rescan any pages, but the cost-benefit ratio places the changes low down on my list of things to attend to.
Round-up
This process proves to be useful several times a year, especially when having to send digital copies of important paperwork to multiple authorities on short deadlines.
This morning I scanned two 18 page documents, each resulting in a 2.2MB PDF which should traverse the e-mail systems without causing a size-limit rejection. Yes, I could upload the file to a secure location and provide a password protected download URL, but not everyone understands or trusts that kind of mechanism. Sometimes, it's easier to stick to what the recipient knows best - PDFs and e-mail.
Resources