Ubuntu Package Archive Mirror Subset

Creating a subset-mirror of the Ubuntu package archive

Preface

Hosts which run the Ubuntu operating system contact the Ubuntu package archive, or mirrors thereof, when updating the system package list (apt update), upgrading packages (apt upgrade) or installing new packages (apt install postgresql). I perform some or all of these actions on my computer every day. I also perform them indirectly when I spin up new instances in LXD, update existing instances, and when I build Snap packages which specify build or stage packages. This results in a lot of packages flowing around the Internet to my humble computer.

The download speed isn't a terrible problem, but I do hammer the servers sometimes when I spin up M containers/ VMs, perform tasks, then tear down the container, and repeat the process N times. There's also the very real concern that if the Internet were to be for all intents and purposes inaccessible, I wouldn't be able to perform maintenance tasks.

I have written about using apt-cacher-ng to cache the packages I have installed. It's a good solution for the packages I have installed at least once, however it doesn't cover the packages I didn't know I'd need at some future time. Today, I'll tell you about my solution to this potential problem: building a local subset of the package archive!

The Package Archive Anatomy

The package archive is a tree of directories, files and links. As of right now rsync reports

total size is 2,995,376,060,090

Yes, 3 TB of data for the entire archive. I don't need all of that, a relevant subset will suffice. For comparison the parts I need require about 1/10th of that.

The archive root is a subdirectory named "ubuntu", within which is metadata and package files.

/ubuntu
├── dists/
├── indices/
├── ls-lR.gz
├── pool/
├── project/
└── ubuntu -> .

dists contains information pertaining directly to each distribution:

/ubuntu/dists
├── bionic-backports/
├── bionic-proposed/
├── bionic-security/
├── bionic-updates/
├── bionic/
├── devel-backports/
├── devel-proposed/
├── devel-security/
├── devel-updates/
├── devel/
├── focal-backports/
├── focal-proposed/
├── focal-security/
├── focal-updates/
├── focal/
├── jammy-backports/
├── jammy-proposed/
├── jammy-security/
├── jammy-updates/
├── jammy/
├── noble-backports/
├── noble-proposed/
├── noble-security/
├── noble-updates/
├── noble/
├── oracular-backports/
├── oracular-proposed/
├── oracular-security/
├── oracular-updates/
├── oracular/
├── plucky-backports/
├── plucky-proposed/
├── plucky-security/
├── plucky-updates/
├── plucky/
├── trusty-backports/
├── trusty-proposed/
├── trusty-security/
├── trusty-updates/
├── trusty/
├── xenial-backports/
├── xenial-proposed/
├── xenial-security/
├── xenial-updates/
└── xenial/

Peeking inside the noble directory, we see some metadata and the four famous repositories "main", "multiverse", "universe" and "restricted".

├── Contents-amd64.gz
├── Contents-i386.gz
├── InRelease
├── Release
├── Release.gpg
├── by-hash/
├── main/
├── multiverse/
├── restricted/
└── universe/

The InRelease file contains a PGP signed set of md5sum hashed file data (It's the Release file with the Release.gpg contained in the file. Such a mechanism allows us to know what the contents of the files should be and prevents us from accepting files from mirrors which are incorrect.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Origin: Ubuntu
Label: Ubuntu
Suite: noble
Version: 24.04
Codename: noble
Date: Thu, 25 Apr 2024 15:10:33 UTC
Architectures: amd64 arm64 armhf i386 ppc64el riscv64 s390x
Components: main restricted universe multiverse
Description: Ubuntu Noble 24.04
MD5Sum:
 1ae40621b32609d6251d09b2a47ef936        829119597 Contents-amd64
 2fc7d01e0a1c7b351738abcd571eec59         51301092 Contents-amd64.gz
 a78c03f162892e93e91366e0ec2a4f13        826443945 Contents-arm64
 c131a52c95ba1474f94558b33807c46c         51152650 Contents-arm64.gz
 442e01d09bc4c22b093bd1909be896e8        756348967 Contents-armhf
 a95dda51f4cc916db83181910e061265         47450781 Contents-armhf.gz
 53bfdd9ece563664817c6ad2850e669c        671051417 Contents-i386

Skipping ahead, we find the Packages files for amd64 and i386 nested inside the <repository>/binary-<arch> directories:

/ubuntu/dists/noble/main/binary-amd64/
├── Packages.gz
├── Packages.xz
├── Release
└── by-hash/

These contain the list of packages, with metadata. Here's the first item in the file:

Package: accountsservice
Architecture: amd64
Version: 23.13.9-2ubuntu6
Priority: optional
Section: gnome
Origin: Ubuntu
Maintainer: Ubuntu Developers <[email protected]>
Original-Maintainer: Debian freedesktop.org maintainers <[email protected]>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 524
Depends: default-dbus-system-bus | dbus-system-bus, libaccountsservice0 (= 23.13.9-2ubuntu6), libc6 (>= 2.34), libglib2.0-0t64 (>= 2.79.0), libpolkit-gobject-1-0 (>= 0.99)
Recommends: default-logind | logind, polkitd
Suggests: gnome-control-center
Filename: pool/main/a/accountsservice/accountsservice_23.13.9-2ubuntu6_amd64.deb
Size: 71126
MD5sum: 7bb2c4b0289d8db1b51dd4db0d9b4b40
SHA1: 021568ec4fd93cff8234576a77418a61937a37bc
SHA256: 92cff947a1b7026d24942de445459d9fbf3cb7254efb98cd581cb4921a4380fd
SHA512: 4b23078f8ca67de99975960913965bd49cc3e08b4f55720f638ec607229ae32751660c3b4e4a5fee0b925e9f2a6790f6680134ca8e01cdff9fba56e4f96701c5
Homepage: https://www.freedesktop.org/wiki/Software/AccountsService/
Description: query and manipulate user account information
Task: ubuntu-desktop-minimal, ubuntu-desktop, ubuntu-desktop-raspi, kubuntu-desktop, xubuntu-minimal, xubuntu-desktop, lubuntu-desktop, ubuntustudio-desktop-core, ubuntustudio-desktop, ubuntukylin-desktop, ubuntukylin-desktop-minimal, ubuntu-mate-core, ubuntu-mate-desktop, ubuntu-budgie-desktop-minimal, ubuntu-budgie-desktop, ubuntu-budgie-desktop-raspi, ubuntu-unity-desktop, edubuntu-desktop-gnome-minimal, edubuntu-desktop-gnome-raspi, ubuntucinnamon-desktop-minimal, ubuntucinnamon-desktop-raspi
Description-md5: 8aeed0a03c7cd494f0c4b8d977483d7e

This tells us about the file, it's contents, it's dependencies and more. Note that there is a line entitled "Filename".

Filename: pool/main/a/accountsservice/accountsservice_23.13.9-2ubuntu6_amd64.deb

That is a relative path to the archive root. This is the contents of that directory:

/ubuntu/pool/main/a/accountsservice/
├── accountsservice_0.6.35-0ubuntu7.3.debian.tar.xz
├── accountsservice_0.6.35-0ubuntu7.3.dsc
├── accountsservice_0.6.35-0ubuntu7.3_amd64.deb
├── accountsservice_0.6.35-0ubuntu7.3_i386.deb
├── accountsservice_0.6.35-0ubuntu7.debian.tar.gz
├── accountsservice_0.6.35-0ubuntu7.dsc
├── accountsservice_0.6.35-0ubuntu7_amd64.deb
├── accountsservice_0.6.35-0ubuntu7_i386.deb
├── accountsservice_0.6.35.orig.tar.xz
├── accountsservice_0.6.40-2ubuntu10.debian.tar.xz
├── accountsservice_0.6.40-2ubuntu10.dsc
├── accountsservice_0.6.40-2ubuntu10_amd64.deb
├── accountsservice_0.6.40-2ubuntu10_i386.deb
├── accountsservice_0.6.40-2ubuntu11.6.debian.tar.xz
├── accountsservice_0.6.40-2ubuntu11.6.dsc
├── accountsservice_0.6.40-2ubuntu11.6_amd64.deb
├── accountsservice_0.6.40-2ubuntu11.6_i386.deb
├── accountsservice_0.6.40.orig.tar.xz
├── accountsservice_0.6.45-1ubuntu1.3.debian.tar.xz
├── accountsservice_0.6.45-1ubuntu1.3.dsc
├── accountsservice_0.6.45-1ubuntu1.3_amd64.deb
├── accountsservice_0.6.45-1ubuntu1.3_i386.deb
├── accountsservice_0.6.45-1ubuntu1.debian.tar.xz
├── accountsservice_0.6.45-1ubuntu1.dsc
├── accountsservice_0.6.45-1ubuntu1_amd64.deb
├── accountsservice_0.6.45-1ubuntu1_i386.deb
├── accountsservice_0.6.45.orig.tar.xz
├── accountsservice_0.6.55-0ubuntu11.debian.tar.xz
├── accountsservice_0.6.55-0ubuntu11.dsc
├── accountsservice_0.6.55-0ubuntu11_amd64.deb
├── accountsservice_0.6.55-0ubuntu12~20.04.7.debian.tar.xz
├── accountsservice_0.6.55-0ubuntu12~20.04.7.dsc
├── accountsservice_0.6.55-0ubuntu12~20.04.7_amd64.deb
├── accountsservice_0.6.55.orig.tar.xz
├── accountsservice_22.07.5-2ubuntu1.5.debian.tar.xz
├── accountsservice_22.07.5-2ubuntu1.5.dsc
├── accountsservice_22.07.5-2ubuntu1.5_amd64.deb
├── accountsservice_22.07.5-2ubuntu1.debian.tar.xz
├── accountsservice_22.07.5-2ubuntu1.dsc
├── accountsservice_22.07.5-2ubuntu1_amd64.deb
├── accountsservice_22.07.5.orig.tar.xz
├── accountsservice_23.13.9-2ubuntu6.debian.tar.xz
├── accountsservice_23.13.9-2ubuntu6.dsc
├── accountsservice_23.13.9-2ubuntu6_amd64.deb
├── accountsservice_23.13.9-7ubuntu1.debian.tar.xz
├── accountsservice_23.13.9-7ubuntu1.dsc
├── accountsservice_23.13.9-7ubuntu1_amd64.deb
├── accountsservice_23.13.9.orig.tar.xz
├── gir1.2-accountsservice-1.0_0.6.35-0ubuntu7.3_amd64.deb
├── gir1.2-accountsservice-1.0_0.6.35-0ubuntu7.3_i386.deb
├── gir1.2-accountsservice-1.0_0.6.35-0ubuntu7_amd64.deb
├── gir1.2-accountsservice-1.0_0.6.35-0ubuntu7_i386.deb
├── gir1.2-accountsservice-1.0_0.6.40-2ubuntu10_amd64.deb
├── gir1.2-accountsservice-1.0_0.6.40-2ubuntu10_i386.deb
├── gir1.2-accountsservice-1.0_0.6.40-2ubuntu11.6_amd64.deb
├── gir1.2-accountsservice-1.0_0.6.40-2ubuntu11.6_i386.deb
├── gir1.2-accountsservice-1.0_0.6.45-1ubuntu1.3_amd64.deb
├── gir1.2-accountsservice-1.0_0.6.45-1ubuntu1.3_i386.deb
├── gir1.2-accountsservice-1.0_0.6.45-1ubuntu1_amd64.deb
├── gir1.2-accountsservice-1.0_0.6.45-1ubuntu1_i386.deb
├── gir1.2-accountsservice-1.0_0.6.55-0ubuntu11_amd64.deb
├── gir1.2-accountsservice-1.0_0.6.55-0ubuntu12~20.04.7_amd64.deb
├── gir1.2-accountsservice-1.0_22.07.5-2ubuntu1.5_amd64.deb
├── gir1.2-accountsservice-1.0_22.07.5-2ubuntu1_amd64.deb
├── gir1.2-accountsservice-1.0_23.13.9-2ubuntu6_amd64.deb
├── gir1.2-accountsservice-1.0_23.13.9-7ubuntu1_amd64.deb
├── libaccountsservice-dbg_0.6.35-0ubuntu7.3_amd64.deb
├── libaccountsservice-dbg_0.6.35-0ubuntu7.3_i386.deb
├── libaccountsservice-dbg_0.6.35-0ubuntu7_amd64.deb
├── libaccountsservice-dbg_0.6.35-0ubuntu7_i386.deb
├── libaccountsservice-dbg_0.6.40-2ubuntu10_amd64.deb
├── libaccountsservice-dbg_0.6.40-2ubuntu10_i386.deb
├── libaccountsservice-dbg_0.6.40-2ubuntu11.6_amd64.deb
├── libaccountsservice-dbg_0.6.40-2ubuntu11.6_i386.deb
├── libaccountsservice-dev_0.6.35-0ubuntu7.3_amd64.deb
├── libaccountsservice-dev_0.6.35-0ubuntu7.3_i386.deb
├── libaccountsservice-dev_0.6.35-0ubuntu7_amd64.deb
├── libaccountsservice-dev_0.6.35-0ubuntu7_i386.deb
├── libaccountsservice-dev_0.6.40-2ubuntu10_amd64.deb
├── libaccountsservice-dev_0.6.40-2ubuntu10_i386.deb
├── libaccountsservice-dev_0.6.40-2ubuntu11.6_amd64.deb
├── libaccountsservice-dev_0.6.40-2ubuntu11.6_i386.deb
├── libaccountsservice-dev_0.6.45-1ubuntu1.3_amd64.deb
├── libaccountsservice-dev_0.6.45-1ubuntu1.3_i386.deb
├── libaccountsservice-dev_0.6.45-1ubuntu1_amd64.deb
├── libaccountsservice-dev_0.6.45-1ubuntu1_i386.deb
├── libaccountsservice-dev_0.6.55-0ubuntu11_amd64.deb
├── libaccountsservice-dev_0.6.55-0ubuntu12~20.04.7_amd64.deb
├── libaccountsservice-dev_22.07.5-2ubuntu1.5_amd64.deb
├── libaccountsservice-dev_22.07.5-2ubuntu1_amd64.deb
├── libaccountsservice-dev_23.13.9-2ubuntu6_amd64.deb
├── libaccountsservice-dev_23.13.9-7ubuntu1_amd64.deb
├── libaccountsservice-doc_0.6.35-0ubuntu7.3_all.deb
├── libaccountsservice-doc_0.6.35-0ubuntu7_all.deb
├── libaccountsservice-doc_0.6.40-2ubuntu10_all.deb
├── libaccountsservice-doc_0.6.40-2ubuntu11.6_all.deb
├── libaccountsservice-doc_0.6.45-1ubuntu1.3_all.deb
├── libaccountsservice-doc_0.6.45-1ubuntu1_all.deb
├── libaccountsservice-doc_0.6.55-0ubuntu11_all.deb
├── libaccountsservice-doc_0.6.55-0ubuntu12~20.04.7_all.deb
├── libaccountsservice-doc_22.07.5-2ubuntu1.5_all.deb
├── libaccountsservice-doc_22.07.5-2ubuntu1_all.deb
├── libaccountsservice-doc_23.13.9-2ubuntu6_all.deb
├── libaccountsservice-doc_23.13.9-7ubuntu1_all.deb
├── libaccountsservice0_0.6.35-0ubuntu7.3_amd64.deb
├── libaccountsservice0_0.6.35-0ubuntu7.3_i386.deb
├── libaccountsservice0_0.6.35-0ubuntu7_amd64.deb
├── libaccountsservice0_0.6.35-0ubuntu7_i386.deb
├── libaccountsservice0_0.6.40-2ubuntu10_amd64.deb
├── libaccountsservice0_0.6.40-2ubuntu10_i386.deb
├── libaccountsservice0_0.6.40-2ubuntu11.6_amd64.deb
├── libaccountsservice0_0.6.40-2ubuntu11.6_i386.deb
├── libaccountsservice0_0.6.45-1ubuntu1.3_amd64.deb
├── libaccountsservice0_0.6.45-1ubuntu1.3_i386.deb
├── libaccountsservice0_0.6.45-1ubuntu1_amd64.deb
├── libaccountsservice0_0.6.45-1ubuntu1_i386.deb
├── libaccountsservice0_0.6.55-0ubuntu11_amd64.deb
├── libaccountsservice0_0.6.55-0ubuntu12~20.04.7_amd64.deb
├── libaccountsservice0_22.07.5-2ubuntu1.5_amd64.deb
├── libaccountsservice0_22.07.5-2ubuntu1_amd64.deb
├── libaccountsservice0_23.13.9-2ubuntu6_amd64.deb
└── libaccountsservice0_23.13.9-7ubuntu1_amd64.deb

We see there are mainly deb and compressed deb files in there. There's also something called a dsc file. Here's one such file:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Format: 3.0 (quilt)
Source: accountsservice
Binary: accountsservice, libaccountsservice0, libaccountsservice-dev, gir1.2-accountsservice-1.0, libaccountsservice-dbg, libaccountsservice-doc, libpam-pin
Architecture: any all
Version: 0.6.35-0ubuntu7
Maintainer: Ubuntu Developers 
Homepage: http://cgit.freedesktop.org/accountsservice/
Standards-Version: 3.9.4
Vcs-Browser: http://anonscm.debian.org/gitweb/?p=collab-maint/accountsservice.git
Vcs-Git: git://anonscm.debian.org/collab-maint/accountsservice.git
Build-Depends: debhelper (>= 7.0.50~), dh-autoreconf, dh-exec, gir1.2-freedesktop, gir1.2-glib-2.0 (>= 1.34), gobject-introspection (>= 0.9.12-4~), gtk-doc-tools, intltool, libgcr-3-dev, libgcrypt11-dev, libgirepository1.0-dev (>= 0.9.12), libglib2.0-dev (>= 2.37.3), libgnutls-dev, libpam0g-dev, libpolkit-gobject-1-dev, libsystemd-login-dev (>= 186), libsystemd-daemon-dev, xmlto
Package-List: 
 accountsservice deb admin optional
 gir1.2-accountsservice-1.0 deb introspection optional
 libaccountsservice-dbg deb debug extra
 libaccountsservice-dev deb libdevel optional
 libaccountsservice-doc deb doc optional
 libaccountsservice0 deb libs optional
 libpam-pin deb admin optional
Checksums-Sha1: 
 915cf5df1ce04a2dfc6026ba58734f9cb77a3cae 360824 accountsservice_0.6.35.orig.tar.xz
 89228414db2f4f83f269450fa5f24db93c5f0f09 67410 accountsservice_0.6.35-0ubuntu7.debian.tar.gz
Checksums-Sha256: 
 65a1c7013c9c6785c7feb710ee940bb297207dabdb93561fdfdd140e0dfd3038 360824 accountsservice_0.6.35.orig.tar.xz
 2ecc84f48f8f42c5f253f8ed4a077cc52d8795f5305e77434d39466b9b31c29c 67410 accountsservice_0.6.35-0ubuntu7.debian.tar.gz
Files: 
 3a81133e95faafb603de4475802cb06a 360824 accountsservice_0.6.35.orig.tar.xz
 1a7c37cb660b6d452b829f951ac0b955 67410 accountsservice_0.6.35-0ubuntu7.debian.tar.gz
Original-Maintainer: Alessio Treglia 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)

iEYEARECAAYFAlLZcjgACgkQQxo87aLX0pLMBQCePE7aL9uaRG+4x1DUSwbTVD5k
l5wAni/R4nlsgpZkbwvkDG5QXQZNjmbA
=YyYU
-----END PGP SIGNATURE-----

Minimum Viable Package Mirror

Now that we have an idea about the structure of the archive and what is contained within, we can start to make a few assumptions about what is needed to stand up a minimum viable repository.

  1. A transmission mechanism: http, https, ftp, rsync
  2. Metadata:
    1. dists/noble
    2. dists/noble-backports
    3. dists/noble-proposed
    4. dists/noble-security
    5. dists/noble-updates
    6. indices
    7. ls-lR.gz
    8. project
    9. ubuntu
  3. The parts of pool mentioned in the Packages files within my desired release(s)

I created a file called archive-metadata which lists the majority of these things, notably excluding the pool:

dists/noble
dists/noble-backports
dists/noble-proposed
dists/noble-security
dists/noble-updates
indices
ls-lR.gz
project
ubuntu

I used rsync to bring all of this down to my computer, using the --files-from option.

declare -r ProjectRoot='/path/to/my/mirrors/Ubuntu'
declare -r MirrorRoot="${ProjectRoot}/ubuntu/"
declare -r PoolFile="${ProjectRoot}/pool-info"
declare -r TopLevelInfo="${ProjectRoot}/archive-metadata"

declare -ar Options=(
    --bwlimit=5MiB
    --archive
    --recursive
    --progress
    --stats
    --log-file="${ProjectRoot}/log.txt"
    --hard-links
    --human-readable
    --atimes
    --checksum-choice=xxh64
    --compress
    --delete
    --partial
)

rsync \
    "${Options[@]}" \
    --files-from="${TopLevelInfo}" \
    "${ArchiveUrl}" \
    "${MirrorRoot}"

Then I parsed the Packages.gz files to extract every filename in the pool that relates to the Noble release, storing the sorted, unique, data in to a file called pool-info. At this point, I had virtually everything I needed so I began the initial sync:

rsync \
    "${Options[@]}" \
    --files-from="${PoolFile}" \
    "${ArchiveUrl}" \
    "${MirrorRoot}"

The sync took several days, being careful not to infuriate my ISP or the mirror from which I'm syncing. (I didn't use archive.ubuntu.com, I used a faster one in my region).

Resyncing

The Ubuntu package archive is updated throughout the day with mirrors being advised to sync every 6 hours. My mirror was therefore out-of-date already, requiring a resync.

Going forward, a resync schedule will be necessary to keep the archive mirror updated.

These are the stats for the resync of the "archive-metadata":

Number of files: 8,062 (reg: 7,507, dir: 551, link: 4)
Number of created files: 50 (reg: 50)
Number of deleted files: 45 (reg: 45)
Number of regular files transferred: 1,286
Total file size: 6.79G bytes
Total transferred file size: 1.02G bytes
Literal data: 44.21M bytes
Matched data: 973.67M bytes
File list size: 422.89K
File list generation time: 0.016 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 3.00M
Total bytes received: 38.33M

sent 3.00M bytes  received 38.33M bytes  1.10M bytes/sec
total size is 6.79G  speedup is 164.39

This is the stats for the resync of the "pool-info":

Number of files: 128,798 (reg: 90,043, dir: 38,278, link: 477)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 0
Total file size: 301.03G bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 3.04M
File list generation time: 111.166 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 6.84M
Total bytes received: 5.92M

sent 6.84M bytes  received 5.92M bytes  23.74K bytes/sec
total size is 301.03G  speedup is 23,587.18

Together they took almost ten minutes to complete.

real    9m37.635s
user    7m51.043s
sys     0m9.903s

Serving The Mirror

Now that the data is ready to be used, it must be made available. For that, I decided to turn to trusty old NGINX. I wrote a server block to serve the files. For some yet to be determined reason LXD, which manages the DNS/ DHCP for my lxdbr0 network, will not perform a DNS lookup on a sub-domain component of "_gateway.lxd ". I therefore opted to have NGINX listen on the IP address using a different port, randomly selecting port 801.

server {
    listen 801;
    listen [::]:801;

    server_name _gateway.lxd 10.169.240.1;

    # Note that the "ubuntu" directory is the webroot.
    root /path/to/my/mirrors/Ubuntu/ubuntu;

    location / {
        autoindex on;
        autoindex_exact_size off;
        autoindex_format html;
        autoindex_localtime on;

        # First attempt to serve request as file, then
        # as directory, then fall back to displaying a 404.
        try_files $uri $uri/ =404;
    }
}

With NGINX reloaded and the firewall opened up for 801/tcp, I spun up a container with some custom "cloud-config" that I modified from the cloud-init module reference:

#cloud-config
apt:
  preserve_sources_list: false
  primary:
    - arches:
        - amd64
        - i386
        - default
      uri: http://10.169.240.1:801/ubuntu
      search_dns: false

Inside the new container, I ripped out the apt proxy information which had been set by my defaults and attempted to install postgresql using my mirror.

sudo apt install postgresql --download-only

Success!

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libcommon-sense-perl libjson-perl libjson-xs-perl libllvm17t64 libpq5 libtypes-serialiser-perl postgresql-16 postgresql-client-16 postgresql-client-common postgresql-common ssl-cert
Suggested packages:
  postgresql-doc postgresql-doc-16
The following NEW packages will be installed:
  libcommon-sense-perl libjson-perl libjson-xs-perl libllvm17t64 libpq5 libtypes-serialiser-perl postgresql postgresql-16 postgresql-client-16 postgresql-client-common postgresql-common
  ssl-cert
0 upgraded, 12 newly installed, 0 to remove and 4 not upgraded.
Need to get 43.5 MB of archives.
After this operation, 175 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://10.169.240.1:801/ubuntu noble/main amd64 libjson-perl all 4.10000-1 [81.9 kB]
Get:2 http://10.169.240.1:801/ubuntu noble-updates/main amd64 postgresql-client-common all 257build1.1 [36.4 kB]
Get:3 http://10.169.240.1:801/ubuntu noble/main amd64 ssl-cert all 1.1.2ubuntu1 [17.8 kB]
Get:4 http://10.169.240.1:801/ubuntu noble-updates/main amd64 postgresql-common all 257build1.1 [161 kB]
Get:5 http://10.169.240.1:801/ubuntu noble/main amd64 libcommon-sense-perl amd64 3.75-3build3 [20.4 kB]
Get:6 http://10.169.240.1:801/ubuntu noble/main amd64 libtypes-serialiser-perl all 1.01-1 [11.6 kB]
Get:7 http://10.169.240.1:801/ubuntu noble/main amd64 libjson-xs-perl amd64 4.030-2build3 [83.6 kB]
Get:8 http://10.169.240.1:801/ubuntu noble/main amd64 libllvm17t64 amd64 1:17.0.6-9ubuntu1 [26.2 MB]
Get:9 http://10.169.240.1:801/ubuntu noble-updates/main amd64 libpq5 amd64 16.6-0ubuntu0.24.04.1 [141 kB]
Get:10 http://10.169.240.1:801/ubuntu noble-updates/main amd64 postgresql-client-16 amd64 16.6-0ubuntu0.24.04.1 [1271 kB]
Get:11 http://10.169.240.1:801/ubuntu noble-updates/main amd64 postgresql-16 amd64 16.6-0ubuntu0.24.04.1 [15.5 MB]
Get:12 http://10.169.240.1:801/ubuntu noble-updates/main amd64 postgresql all 16+257build1.1 [11.6 kB]
Fetched 43.5 MB in 0s (96.0 MB/s)
Download complete and in download only mode

NGINX logs:

==> /var/log/nginx/access.log <==
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/libj/libjson-perl/libjson-perl_4.10000-1_all.deb HTTP/1.1" 200 81896 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/p/postgresql-common/postgresql-client-common_257build1.1_all.deb HTTP/1.1" 200 36410 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/s/ssl-cert/ssl-cert_1.1.2ubuntu1_all.deb HTTP/1.1" 200 17826 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/p/postgresql-common/postgresql-common_257build1.1_all.deb HTTP/1.1" 200 161444 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/libc/libcommon-sense-perl/libcommon-sense-perl_3.75-3build3_amd64.deb HTTP/1.1" 200 20430 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/libt/libtypes-serialiser-perl/libtypes-serialiser-perl_1.01-1_all.deb HTTP/1.1" 200 11552 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/libj/libjson-xs-perl/libjson-xs-perl_4.030-2build3_amd64.deb HTTP/1.1" 200 83574 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:25 +0000] "GET /ubuntu/pool/main/l/llvm-toolchain-17/libllvm17t64_17.0.6-9ubuntu1_amd64.deb HTTP/1.1" 200 26162724 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:25 +0000] "GET /ubuntu/pool/main/p/postgresql-16/libpq5_16.6-0ubuntu0.24.04.1_amd64.deb HTTP/1.1" 200 141282 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:25 +0000] "GET /ubuntu/pool/main/p/postgresql-16/postgresql-client-16_16.6-0ubuntu0.24.04.1_amd64.deb HTTP/1.1" 200 1270788 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:25 +0000] "GET /ubuntu/pool/main/p/postgresql-16/postgresql-16_16.6-0ubuntu0.24.04.1_amd64.deb HTTP/1.1" 200 15526230 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:25 +0000] "GET /ubuntu/pool/main/p/postgresql-common/postgresql_16%2b257build1.1_all.deb HTTP/1.1" 200 11586 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"

Benchmarks

I've recorded some modest benchmarks. The local mirror is the one installed on my host, the remote mirror is the Ubuntu Archive.

To obtain this data, I executed the following script which sets up a container, waits for it to be ready, then executes the three commands of interest sequentially, timing the them. There are a number of processes running on my host, so the CPU time available will fluctuate and cause this benchmark to be pretty rudimentary.

#!/bin/bash

declare -r Image='ubuntu:noble'
declare -ar ContainerProperties=(
    --no-profiles
    --storage zfs-lxd 
    --network=lxdbr0
)
declare -r ContainerNameLocal='package-mirror-local'
declare -r ContainerNameRemote='package-mirror-remote'
declare -r LogNameLocal="/tmp/mirror-local-$(date -Is).log"
declare -r LogNameRemote="/tmp/mirror-remote-$(date -Is).log"
declare -ra PackagesToDownload=(
    build-essential
    cmake
    nginx
    pkg-config
    postgresql
)

mkdir -p benchmarks/tmp

lxc launch "${Image}" "${ContainerProperties[@]}" --config=user.user-data="$(cat user-data.yaml)" "${ContainerNameLocal}"
lxc exec "${ContainerNameLocal}" -- cloud-init status --wait
lxc exec "${ContainerNameLocal}" -- /usr/bin/time --append --output "${LogNameLocal}" -p apt update -y
lxc exec "${ContainerNameLocal}" -- /usr/bin/time --append --output "${LogNameLocal}" -p apt upgrade -y
lxc exec "${ContainerNameLocal}" -- /usr/bin/time --append --output "${LogNameLocal}" -p apt install -y "${PackagesToDownload[@]}" --download-only
lxc file pull "${ContainerNameLocal}""${LogNameLocal}" "benchmarks/${LogNameLocal}"
lxc delete "${ContainerNameLocal}" --force

lxc launch "${Image}" "${ContainerProperties[@]}" --config=user.user-data="$(cat user-data.yaml)" "${ContainerNameRemote}"
lxc exec "${ContainerNameRemote}" -- cloud-init status --wait
lxc exec "${ContainerNameRemote}" -- /usr/bin/time --append --output "${LogNameRemote}" -p apt update -y
lxc exec "${ContainerNameRemote}" -- /usr/bin/time --append --output "${LogNameRemote}" -p apt upgrade -y
lxc exec "${ContainerNameRemote}" -- /usr/bin/time --append --output "${LogNameRemote}" -p apt install -y "${PackagesToDownload[@]}" --download-only
lxc file pull "${ContainerNameRemote}""${LogNameRemote}" "benchmarks/${LogNameRemote}"
lxc delete "${ContainerNameRemote}" --force

The packages to download total approximately 136MB, which I think is a good volume to test on. I began the tests at 21:55 UTC on a Friday evening, performing ten iterations of the test, each iteration being spearated by one minute as a courtesy measure.

The results show the cumulative time spent for each step of the test. That is to say, all of the update times were accumulated, all of the upgrade times were accumulated, all of the installation download times were accumulated.

Phase Time (seconds)
localremotelocal:remote (%)
update163.91194.0584.47%
upgrade249.88284.2987.90%
install --download-only22.66162.913.91%
Total Result436.45641.2468.06%

The time improvements for the update and upgrade (30 packages) were modest however I am certain that these steps included consequential client-side processing times. The "installation" step, which only downloaded the packages, yielded a significant time improvement. We can see this step took less than 14% of the time it took to download from the Ubuntu Package Archive.

Going Forward

Preventing Syncing During An Archive Update

When the Ubuntu Package Archive is undergoing an update, a file is generated to warn users. This can be used by clients to prevent mirror corruption. My solution shouldabe altered to check whether the file exists and take that into account.

Deletion

Files on my mirror which have been removed from the Ubuntu Package Archive must be deleted. The rsync command I wrote deletes such files, but it would probably be better to delete them after the sync.

Preventing Corruption

If a sync should fail, I could be left with an inoperable mirror. This will not do. It would be ideal if I could retain a copy of my mirror until the sync has finished, then serve the new one if the sync succeeds. I have used the --link-dest feature of rsync to accomplish that! The procedure is:

  1. rename the existing root from ubuntu to ubuntu-initial
  2. create a symbolic link called ubuntu which points to ubuntu-initial
  3. rename the scripted sync destination to ubuntu-now where "now" is the current timestamp.
  4. add --link-dest=/path/to/my/mirrors/Ubuntu/ubuntu to the script rsync options
  5. after the sync finishes, delete the symlink and create a new one pointing the the new sync directory
All unchanged files will be hard-linked into the new directory, new ones added, removed ones omitted.

This is post-sync result:

lrwxrwxrwx 1 daniel daniel   32 Dec 14 17:12 ubuntu -> ubuntu-2024-12-14T16:56:00+00:00
drwxrwxr-x 6 daniel daniel 4096 Dec 14 16:59 ubuntu-2024-12-14T16:56:00+00:00
drwxrwxr-x 6 daniel daniel 4096 Dec 14 17:12 ubuntu-initial

Wonderful! The symlink should only ever point to a successfully synced directory, and the data should never be reachable in a corrupt state. All that's left to do is reap the older syncs and to make it obvious when a sync failed. I'll reap the old syncs manually for now.

As for the failed syncs, I'll name the MirrorRoot a dot-name e.g. ".ubuntu-2024-12-14T17:36:41+00:00" and promote it to non-dot "ubuntu-2024-12-14T17:36:41+00:00" after the sync has successfully completed, then to sym-link to it.

Round-up

Now I have a functional package archive mirror with a script to manage updates. It's working well so far. I intend to perform some more tests in the coming days but I envisage updating my host and the LXD cloud-config profiles to point to the archive very soon. If all goes well, I will no longer use apt-cacher-ng for the Ubuntu package archive, but will keep it for other Debian repositories I may need. Docker for instance, doesn't facilitate rsync connections, therefore apt-cacher-ng is a useful tool to have.

I hope you've enjoyed this brief introduction to the archive and how to make your own subset.

Go forth and have fun! :)

Resources