Preface
Hosts which run the Ubuntu operating system contact the Ubuntu package archive, or mirrors thereof, when updating the system package list (apt update
), upgrading packages (apt upgrade
) or installing new packages (apt install postgresql
).
I perform some or all of these actions on my computer every day.
I also perform them indirectly when I spin up new instances in LXD, update existing instances, and when I build Snap packages which specify build or stage packages.
This results in a lot of packages flowing around the Internet to my humble computer.
The download speed isn't a terrible problem, but I do hammer the servers sometimes when I spin up M containers/ VMs, perform tasks, then tear down the container, and repeat the process N times. There's also the very real concern that if the Internet were to be for all intents and purposes inaccessible, I wouldn't be able to perform maintenance tasks.
I have written about using apt-cacher-ng
to cache the packages I have installed.
It's a good solution for the packages I have installed at least once, however it doesn't cover the packages I didn't know I'd need at some future time.
Today, I'll tell you about my solution to this potential problem: building a local subset of the package archive!
The Package Archive Anatomy
The package archive is a tree of directories, files and links. As of right now rsync reports
total size is 2,995,376,060,090
Yes, 3 TB of data for the entire archive. I don't need all of that, a relevant subset will suffice. For comparison the parts I need require about 1/10th of that.
The archive root is a subdirectory named "ubuntu", within which is metadata and package files.
/ubuntu ├── dists/ ├── indices/ ├── ls-lR.gz ├── pool/ ├── project/ └── ubuntu -> .
dists
contains information pertaining directly to each distribution:
/ubuntu/dists ├── bionic-backports/ ├── bionic-proposed/ ├── bionic-security/ ├── bionic-updates/ ├── bionic/ ├── devel-backports/ ├── devel-proposed/ ├── devel-security/ ├── devel-updates/ ├── devel/ ├── focal-backports/ ├── focal-proposed/ ├── focal-security/ ├── focal-updates/ ├── focal/ ├── jammy-backports/ ├── jammy-proposed/ ├── jammy-security/ ├── jammy-updates/ ├── jammy/ ├── noble-backports/ ├── noble-proposed/ ├── noble-security/ ├── noble-updates/ ├── noble/ ├── oracular-backports/ ├── oracular-proposed/ ├── oracular-security/ ├── oracular-updates/ ├── oracular/ ├── plucky-backports/ ├── plucky-proposed/ ├── plucky-security/ ├── plucky-updates/ ├── plucky/ ├── trusty-backports/ ├── trusty-proposed/ ├── trusty-security/ ├── trusty-updates/ ├── trusty/ ├── xenial-backports/ ├── xenial-proposed/ ├── xenial-security/ ├── xenial-updates/ └── xenial/
Peeking inside the noble
directory, we see some metadata and the four famous repositories "main", "multiverse", "universe" and "restricted".
├── Contents-amd64.gz ├── Contents-i386.gz ├── InRelease ├── Release ├── Release.gpg ├── by-hash/ ├── main/ ├── multiverse/ ├── restricted/ └── universe/
The InRelease
file contains a PGP signed set of md5sum hashed file data (It's the Release
file with the Release.gpg
contained in the file.
Such a mechanism allows us to know what the contents of the files should be and prevents us from accepting files from mirrors which are incorrect.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Origin: Ubuntu Label: Ubuntu Suite: noble Version: 24.04 Codename: noble Date: Thu, 25 Apr 2024 15:10:33 UTC Architectures: amd64 arm64 armhf i386 ppc64el riscv64 s390x Components: main restricted universe multiverse Description: Ubuntu Noble 24.04 MD5Sum: 1ae40621b32609d6251d09b2a47ef936 829119597 Contents-amd64 2fc7d01e0a1c7b351738abcd571eec59 51301092 Contents-amd64.gz a78c03f162892e93e91366e0ec2a4f13 826443945 Contents-arm64 c131a52c95ba1474f94558b33807c46c 51152650 Contents-arm64.gz 442e01d09bc4c22b093bd1909be896e8 756348967 Contents-armhf a95dda51f4cc916db83181910e061265 47450781 Contents-armhf.gz 53bfdd9ece563664817c6ad2850e669c 671051417 Contents-i386
Skipping ahead, we find the Packages files for amd64 and i386 nested inside the <repository>/binary-<arch>
directories:
/ubuntu/dists/noble/main/binary-amd64/ ├── Packages.gz ├── Packages.xz ├── Release └── by-hash/
These contain the list of packages, with metadata. Here's the first item in the file:
Package: accountsservice
Architecture: amd64
Version: 23.13.9-2ubuntu6
Priority: optional
Section: gnome
Origin: Ubuntu
Maintainer: Ubuntu Developers <[email protected]>
Original-Maintainer: Debian freedesktop.org maintainers <[email protected]>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 524
Depends: default-dbus-system-bus | dbus-system-bus, libaccountsservice0 (= 23.13.9-2ubuntu6), libc6 (>= 2.34), libglib2.0-0t64 (>= 2.79.0), libpolkit-gobject-1-0 (>= 0.99)
Recommends: default-logind | logind, polkitd
Suggests: gnome-control-center
Filename: pool/main/a/accountsservice/accountsservice_23.13.9-2ubuntu6_amd64.deb
Size: 71126
MD5sum: 7bb2c4b0289d8db1b51dd4db0d9b4b40
SHA1: 021568ec4fd93cff8234576a77418a61937a37bc
SHA256: 92cff947a1b7026d24942de445459d9fbf3cb7254efb98cd581cb4921a4380fd
SHA512: 4b23078f8ca67de99975960913965bd49cc3e08b4f55720f638ec607229ae32751660c3b4e4a5fee0b925e9f2a6790f6680134ca8e01cdff9fba56e4f96701c5
Homepage: https://www.freedesktop.org/wiki/Software/AccountsService/
Description: query and manipulate user account information
Task: ubuntu-desktop-minimal, ubuntu-desktop, ubuntu-desktop-raspi, kubuntu-desktop, xubuntu-minimal, xubuntu-desktop, lubuntu-desktop, ubuntustudio-desktop-core, ubuntustudio-desktop, ubuntukylin-desktop, ubuntukylin-desktop-minimal, ubuntu-mate-core, ubuntu-mate-desktop, ubuntu-budgie-desktop-minimal, ubuntu-budgie-desktop, ubuntu-budgie-desktop-raspi, ubuntu-unity-desktop, edubuntu-desktop-gnome-minimal, edubuntu-desktop-gnome-raspi, ubuntucinnamon-desktop-minimal, ubuntucinnamon-desktop-raspi
Description-md5: 8aeed0a03c7cd494f0c4b8d977483d7e
This tells us about the file, it's contents, it's dependencies and more. Note that there is a line entitled "Filename".
Filename: pool/main/a/accountsservice/accountsservice_23.13.9-2ubuntu6_amd64.deb
That is a relative path to the archive root. This is the contents of that directory:
/ubuntu/pool/main/a/accountsservice/ ├── accountsservice_0.6.35-0ubuntu7.3.debian.tar.xz ├── accountsservice_0.6.35-0ubuntu7.3.dsc ├── accountsservice_0.6.35-0ubuntu7.3_amd64.deb ├── accountsservice_0.6.35-0ubuntu7.3_i386.deb ├── accountsservice_0.6.35-0ubuntu7.debian.tar.gz ├── accountsservice_0.6.35-0ubuntu7.dsc ├── accountsservice_0.6.35-0ubuntu7_amd64.deb ├── accountsservice_0.6.35-0ubuntu7_i386.deb ├── accountsservice_0.6.35.orig.tar.xz ├── accountsservice_0.6.40-2ubuntu10.debian.tar.xz ├── accountsservice_0.6.40-2ubuntu10.dsc ├── accountsservice_0.6.40-2ubuntu10_amd64.deb ├── accountsservice_0.6.40-2ubuntu10_i386.deb ├── accountsservice_0.6.40-2ubuntu11.6.debian.tar.xz ├── accountsservice_0.6.40-2ubuntu11.6.dsc ├── accountsservice_0.6.40-2ubuntu11.6_amd64.deb ├── accountsservice_0.6.40-2ubuntu11.6_i386.deb ├── accountsservice_0.6.40.orig.tar.xz ├── accountsservice_0.6.45-1ubuntu1.3.debian.tar.xz ├── accountsservice_0.6.45-1ubuntu1.3.dsc ├── accountsservice_0.6.45-1ubuntu1.3_amd64.deb ├── accountsservice_0.6.45-1ubuntu1.3_i386.deb ├── accountsservice_0.6.45-1ubuntu1.debian.tar.xz ├── accountsservice_0.6.45-1ubuntu1.dsc ├── accountsservice_0.6.45-1ubuntu1_amd64.deb ├── accountsservice_0.6.45-1ubuntu1_i386.deb ├── accountsservice_0.6.45.orig.tar.xz ├── accountsservice_0.6.55-0ubuntu11.debian.tar.xz ├── accountsservice_0.6.55-0ubuntu11.dsc ├── accountsservice_0.6.55-0ubuntu11_amd64.deb ├── accountsservice_0.6.55-0ubuntu12~20.04.7.debian.tar.xz ├── accountsservice_0.6.55-0ubuntu12~20.04.7.dsc ├── accountsservice_0.6.55-0ubuntu12~20.04.7_amd64.deb ├── accountsservice_0.6.55.orig.tar.xz ├── accountsservice_22.07.5-2ubuntu1.5.debian.tar.xz ├── accountsservice_22.07.5-2ubuntu1.5.dsc ├── accountsservice_22.07.5-2ubuntu1.5_amd64.deb ├── accountsservice_22.07.5-2ubuntu1.debian.tar.xz ├── accountsservice_22.07.5-2ubuntu1.dsc ├── accountsservice_22.07.5-2ubuntu1_amd64.deb ├── accountsservice_22.07.5.orig.tar.xz ├── accountsservice_23.13.9-2ubuntu6.debian.tar.xz ├── accountsservice_23.13.9-2ubuntu6.dsc ├── accountsservice_23.13.9-2ubuntu6_amd64.deb ├── accountsservice_23.13.9-7ubuntu1.debian.tar.xz ├── accountsservice_23.13.9-7ubuntu1.dsc ├── accountsservice_23.13.9-7ubuntu1_amd64.deb ├── accountsservice_23.13.9.orig.tar.xz ├── gir1.2-accountsservice-1.0_0.6.35-0ubuntu7.3_amd64.deb ├── gir1.2-accountsservice-1.0_0.6.35-0ubuntu7.3_i386.deb ├── gir1.2-accountsservice-1.0_0.6.35-0ubuntu7_amd64.deb ├── gir1.2-accountsservice-1.0_0.6.35-0ubuntu7_i386.deb ├── gir1.2-accountsservice-1.0_0.6.40-2ubuntu10_amd64.deb ├── gir1.2-accountsservice-1.0_0.6.40-2ubuntu10_i386.deb ├── gir1.2-accountsservice-1.0_0.6.40-2ubuntu11.6_amd64.deb ├── gir1.2-accountsservice-1.0_0.6.40-2ubuntu11.6_i386.deb ├── gir1.2-accountsservice-1.0_0.6.45-1ubuntu1.3_amd64.deb ├── gir1.2-accountsservice-1.0_0.6.45-1ubuntu1.3_i386.deb ├── gir1.2-accountsservice-1.0_0.6.45-1ubuntu1_amd64.deb ├── gir1.2-accountsservice-1.0_0.6.45-1ubuntu1_i386.deb ├── gir1.2-accountsservice-1.0_0.6.55-0ubuntu11_amd64.deb ├── gir1.2-accountsservice-1.0_0.6.55-0ubuntu12~20.04.7_amd64.deb ├── gir1.2-accountsservice-1.0_22.07.5-2ubuntu1.5_amd64.deb ├── gir1.2-accountsservice-1.0_22.07.5-2ubuntu1_amd64.deb ├── gir1.2-accountsservice-1.0_23.13.9-2ubuntu6_amd64.deb ├── gir1.2-accountsservice-1.0_23.13.9-7ubuntu1_amd64.deb ├── libaccountsservice-dbg_0.6.35-0ubuntu7.3_amd64.deb ├── libaccountsservice-dbg_0.6.35-0ubuntu7.3_i386.deb ├── libaccountsservice-dbg_0.6.35-0ubuntu7_amd64.deb ├── libaccountsservice-dbg_0.6.35-0ubuntu7_i386.deb ├── libaccountsservice-dbg_0.6.40-2ubuntu10_amd64.deb ├── libaccountsservice-dbg_0.6.40-2ubuntu10_i386.deb ├── libaccountsservice-dbg_0.6.40-2ubuntu11.6_amd64.deb ├── libaccountsservice-dbg_0.6.40-2ubuntu11.6_i386.deb ├── libaccountsservice-dev_0.6.35-0ubuntu7.3_amd64.deb ├── libaccountsservice-dev_0.6.35-0ubuntu7.3_i386.deb ├── libaccountsservice-dev_0.6.35-0ubuntu7_amd64.deb ├── libaccountsservice-dev_0.6.35-0ubuntu7_i386.deb ├── libaccountsservice-dev_0.6.40-2ubuntu10_amd64.deb ├── libaccountsservice-dev_0.6.40-2ubuntu10_i386.deb ├── libaccountsservice-dev_0.6.40-2ubuntu11.6_amd64.deb ├── libaccountsservice-dev_0.6.40-2ubuntu11.6_i386.deb ├── libaccountsservice-dev_0.6.45-1ubuntu1.3_amd64.deb ├── libaccountsservice-dev_0.6.45-1ubuntu1.3_i386.deb ├── libaccountsservice-dev_0.6.45-1ubuntu1_amd64.deb ├── libaccountsservice-dev_0.6.45-1ubuntu1_i386.deb ├── libaccountsservice-dev_0.6.55-0ubuntu11_amd64.deb ├── libaccountsservice-dev_0.6.55-0ubuntu12~20.04.7_amd64.deb ├── libaccountsservice-dev_22.07.5-2ubuntu1.5_amd64.deb ├── libaccountsservice-dev_22.07.5-2ubuntu1_amd64.deb ├── libaccountsservice-dev_23.13.9-2ubuntu6_amd64.deb ├── libaccountsservice-dev_23.13.9-7ubuntu1_amd64.deb ├── libaccountsservice-doc_0.6.35-0ubuntu7.3_all.deb ├── libaccountsservice-doc_0.6.35-0ubuntu7_all.deb ├── libaccountsservice-doc_0.6.40-2ubuntu10_all.deb ├── libaccountsservice-doc_0.6.40-2ubuntu11.6_all.deb ├── libaccountsservice-doc_0.6.45-1ubuntu1.3_all.deb ├── libaccountsservice-doc_0.6.45-1ubuntu1_all.deb ├── libaccountsservice-doc_0.6.55-0ubuntu11_all.deb ├── libaccountsservice-doc_0.6.55-0ubuntu12~20.04.7_all.deb ├── libaccountsservice-doc_22.07.5-2ubuntu1.5_all.deb ├── libaccountsservice-doc_22.07.5-2ubuntu1_all.deb ├── libaccountsservice-doc_23.13.9-2ubuntu6_all.deb ├── libaccountsservice-doc_23.13.9-7ubuntu1_all.deb ├── libaccountsservice0_0.6.35-0ubuntu7.3_amd64.deb ├── libaccountsservice0_0.6.35-0ubuntu7.3_i386.deb ├── libaccountsservice0_0.6.35-0ubuntu7_amd64.deb ├── libaccountsservice0_0.6.35-0ubuntu7_i386.deb ├── libaccountsservice0_0.6.40-2ubuntu10_amd64.deb ├── libaccountsservice0_0.6.40-2ubuntu10_i386.deb ├── libaccountsservice0_0.6.40-2ubuntu11.6_amd64.deb ├── libaccountsservice0_0.6.40-2ubuntu11.6_i386.deb ├── libaccountsservice0_0.6.45-1ubuntu1.3_amd64.deb ├── libaccountsservice0_0.6.45-1ubuntu1.3_i386.deb ├── libaccountsservice0_0.6.45-1ubuntu1_amd64.deb ├── libaccountsservice0_0.6.45-1ubuntu1_i386.deb ├── libaccountsservice0_0.6.55-0ubuntu11_amd64.deb ├── libaccountsservice0_0.6.55-0ubuntu12~20.04.7_amd64.deb ├── libaccountsservice0_22.07.5-2ubuntu1.5_amd64.deb ├── libaccountsservice0_22.07.5-2ubuntu1_amd64.deb ├── libaccountsservice0_23.13.9-2ubuntu6_amd64.deb └── libaccountsservice0_23.13.9-7ubuntu1_amd64.deb
We see there are mainly deb and compressed deb files in there. There's also something called a dsc file. Here's one such file:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Format: 3.0 (quilt) Source: accountsservice Binary: accountsservice, libaccountsservice0, libaccountsservice-dev, gir1.2-accountsservice-1.0, libaccountsservice-dbg, libaccountsservice-doc, libpam-pin Architecture: any all Version: 0.6.35-0ubuntu7 Maintainer: Ubuntu DevelopersHomepage: http://cgit.freedesktop.org/accountsservice/ Standards-Version: 3.9.4 Vcs-Browser: http://anonscm.debian.org/gitweb/?p=collab-maint/accountsservice.git Vcs-Git: git://anonscm.debian.org/collab-maint/accountsservice.git Build-Depends: debhelper (>= 7.0.50~), dh-autoreconf, dh-exec, gir1.2-freedesktop, gir1.2-glib-2.0 (>= 1.34), gobject-introspection (>= 0.9.12-4~), gtk-doc-tools, intltool, libgcr-3-dev, libgcrypt11-dev, libgirepository1.0-dev (>= 0.9.12), libglib2.0-dev (>= 2.37.3), libgnutls-dev, libpam0g-dev, libpolkit-gobject-1-dev, libsystemd-login-dev (>= 186), libsystemd-daemon-dev, xmlto Package-List: accountsservice deb admin optional gir1.2-accountsservice-1.0 deb introspection optional libaccountsservice-dbg deb debug extra libaccountsservice-dev deb libdevel optional libaccountsservice-doc deb doc optional libaccountsservice0 deb libs optional libpam-pin deb admin optional Checksums-Sha1: 915cf5df1ce04a2dfc6026ba58734f9cb77a3cae 360824 accountsservice_0.6.35.orig.tar.xz 89228414db2f4f83f269450fa5f24db93c5f0f09 67410 accountsservice_0.6.35-0ubuntu7.debian.tar.gz Checksums-Sha256: 65a1c7013c9c6785c7feb710ee940bb297207dabdb93561fdfdd140e0dfd3038 360824 accountsservice_0.6.35.orig.tar.xz 2ecc84f48f8f42c5f253f8ed4a077cc52d8795f5305e77434d39466b9b31c29c 67410 accountsservice_0.6.35-0ubuntu7.debian.tar.gz Files: 3a81133e95faafb603de4475802cb06a 360824 accountsservice_0.6.35.orig.tar.xz 1a7c37cb660b6d452b829f951ac0b955 67410 accountsservice_0.6.35-0ubuntu7.debian.tar.gz Original-Maintainer: Alessio Treglia -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) iEYEARECAAYFAlLZcjgACgkQQxo87aLX0pLMBQCePE7aL9uaRG+4x1DUSwbTVD5k l5wAni/R4nlsgpZkbwvkDG5QXQZNjmbA =YyYU -----END PGP SIGNATURE-----
Minimum Viable Package Mirror
Now that we have an idea about the structure of the archive and what is contained within, we can start to make a few assumptions about what is needed to stand up a minimum viable repository.
- A transmission mechanism: http, https, ftp, rsync
-
Metadata:
- dists/noble
- dists/noble-backports
- dists/noble-proposed
- dists/noble-security
- dists/noble-updates
- indices
- ls-lR.gz
- project
- ubuntu
- The parts of pool mentioned in the Packages files within my desired release(s)
I created a file called archive-metadata
which lists the majority of these things, notably excluding the pool
:
dists/noble dists/noble-backports dists/noble-proposed dists/noble-security dists/noble-updates indices ls-lR.gz project ubuntu
I used rsync to bring all of this down to my computer, using the --files-from
option.
declare -r ProjectRoot='/path/to/my/mirrors/Ubuntu'
declare -r MirrorRoot="${ProjectRoot}/ubuntu/"
declare -r PoolFile="${ProjectRoot}/pool-info"
declare -r TopLevelInfo="${ProjectRoot}/archive-metadata"
declare -ar Options=(
--bwlimit=5MiB
--archive
--recursive
--progress
--stats
--log-file="${ProjectRoot}/log.txt"
--hard-links
--human-readable
--atimes
--checksum-choice=xxh64
--compress
--delete
--partial
)
rsync \
"${Options[@]}" \
--files-from="${TopLevelInfo}" \
"${ArchiveUrl}" \
"${MirrorRoot}"
Then I parsed the Packages.gz
files to extract every filename in the pool that relates to the Noble release, storing the sorted, unique, data in to a file called pool-info
.
At this point, I had virtually everything I needed so I began the initial sync:
rsync \
"${Options[@]}" \
--files-from="${PoolFile}" \
"${ArchiveUrl}" \
"${MirrorRoot}"
The sync took several days, being careful not to infuriate my ISP or the mirror from which I'm syncing. (I didn't use archive.ubuntu.com, I used a faster one in my region).
Resyncing
The Ubuntu package archive is updated throughout the day with mirrors being advised to sync every 6 hours. My mirror was therefore out-of-date already, requiring a resync.
Going forward, a resync schedule will be necessary to keep the archive mirror updated.
These are the stats for the resync of the "archive-metadata":
Number of files: 8,062 (reg: 7,507, dir: 551, link: 4)
Number of created files: 50 (reg: 50)
Number of deleted files: 45 (reg: 45)
Number of regular files transferred: 1,286
Total file size: 6.79G bytes
Total transferred file size: 1.02G bytes
Literal data: 44.21M bytes
Matched data: 973.67M bytes
File list size: 422.89K
File list generation time: 0.016 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 3.00M
Total bytes received: 38.33M
sent 3.00M bytes received 38.33M bytes 1.10M bytes/sec
total size is 6.79G speedup is 164.39
This is the stats for the resync of the "pool-info":
Number of files: 128,798 (reg: 90,043, dir: 38,278, link: 477)
Number of created files: 0
Number of deleted files: 0
Number of regular files transferred: 0
Total file size: 301.03G bytes
Total transferred file size: 0 bytes
Literal data: 0 bytes
Matched data: 0 bytes
File list size: 3.04M
File list generation time: 111.166 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 6.84M
Total bytes received: 5.92M
sent 6.84M bytes received 5.92M bytes 23.74K bytes/sec
total size is 301.03G speedup is 23,587.18
Together they took almost ten minutes to complete.
real 9m37.635s
user 7m51.043s
sys 0m9.903s
Serving The Mirror
Now that the data is ready to be used, it must be made available. For that, I decided to turn to trusty old NGINX. I wrote a server block to serve the files. For some yet to be determined reason LXD, which manages the DNS/ DHCP for my lxdbr0 network, will not perform a DNS lookup on a sub-domain component of "_gateway.lxd ". I therefore opted to have NGINX listen on the IP address using a different port, randomly selecting port 801.
server {
listen 801;
listen [::]:801;
server_name _gateway.lxd 10.169.240.1;
# Note that the "ubuntu" directory is the webroot.
root /path/to/my/mirrors/Ubuntu/ubuntu;
location / {
autoindex on;
autoindex_exact_size off;
autoindex_format html;
autoindex_localtime on;
# First attempt to serve request as file, then
# as directory, then fall back to displaying a 404.
try_files $uri $uri/ =404;
}
}
With NGINX reloaded and the firewall opened up for 801/tcp, I spun up a container with some custom "cloud-config" that I modified from the cloud-init module reference:
#cloud-config
apt:
preserve_sources_list: false
primary:
- arches:
- amd64
- i386
- default
uri: http://10.169.240.1:801/ubuntu
search_dns: false
Inside the new container, I ripped out the apt proxy information which had been set by my defaults and attempted to install postgresql
using my mirror.
sudo apt install postgresql --download-only
Success!
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
libcommon-sense-perl libjson-perl libjson-xs-perl libllvm17t64 libpq5 libtypes-serialiser-perl postgresql-16 postgresql-client-16 postgresql-client-common postgresql-common ssl-cert
Suggested packages:
postgresql-doc postgresql-doc-16
The following NEW packages will be installed:
libcommon-sense-perl libjson-perl libjson-xs-perl libllvm17t64 libpq5 libtypes-serialiser-perl postgresql postgresql-16 postgresql-client-16 postgresql-client-common postgresql-common
ssl-cert
0 upgraded, 12 newly installed, 0 to remove and 4 not upgraded.
Need to get 43.5 MB of archives.
After this operation, 175 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://10.169.240.1:801/ubuntu noble/main amd64 libjson-perl all 4.10000-1 [81.9 kB]
Get:2 http://10.169.240.1:801/ubuntu noble-updates/main amd64 postgresql-client-common all 257build1.1 [36.4 kB]
Get:3 http://10.169.240.1:801/ubuntu noble/main amd64 ssl-cert all 1.1.2ubuntu1 [17.8 kB]
Get:4 http://10.169.240.1:801/ubuntu noble-updates/main amd64 postgresql-common all 257build1.1 [161 kB]
Get:5 http://10.169.240.1:801/ubuntu noble/main amd64 libcommon-sense-perl amd64 3.75-3build3 [20.4 kB]
Get:6 http://10.169.240.1:801/ubuntu noble/main amd64 libtypes-serialiser-perl all 1.01-1 [11.6 kB]
Get:7 http://10.169.240.1:801/ubuntu noble/main amd64 libjson-xs-perl amd64 4.030-2build3 [83.6 kB]
Get:8 http://10.169.240.1:801/ubuntu noble/main amd64 libllvm17t64 amd64 1:17.0.6-9ubuntu1 [26.2 MB]
Get:9 http://10.169.240.1:801/ubuntu noble-updates/main amd64 libpq5 amd64 16.6-0ubuntu0.24.04.1 [141 kB]
Get:10 http://10.169.240.1:801/ubuntu noble-updates/main amd64 postgresql-client-16 amd64 16.6-0ubuntu0.24.04.1 [1271 kB]
Get:11 http://10.169.240.1:801/ubuntu noble-updates/main amd64 postgresql-16 amd64 16.6-0ubuntu0.24.04.1 [15.5 MB]
Get:12 http://10.169.240.1:801/ubuntu noble-updates/main amd64 postgresql all 16+257build1.1 [11.6 kB]
Fetched 43.5 MB in 0s (96.0 MB/s)
Download complete and in download only mode
NGINX logs:
==> /var/log/nginx/access.log <==
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/libj/libjson-perl/libjson-perl_4.10000-1_all.deb HTTP/1.1" 200 81896 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/p/postgresql-common/postgresql-client-common_257build1.1_all.deb HTTP/1.1" 200 36410 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/s/ssl-cert/ssl-cert_1.1.2ubuntu1_all.deb HTTP/1.1" 200 17826 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/p/postgresql-common/postgresql-common_257build1.1_all.deb HTTP/1.1" 200 161444 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/libc/libcommon-sense-perl/libcommon-sense-perl_3.75-3build3_amd64.deb HTTP/1.1" 200 20430 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/libt/libtypes-serialiser-perl/libtypes-serialiser-perl_1.01-1_all.deb HTTP/1.1" 200 11552 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:24 +0000] "GET /ubuntu/pool/main/libj/libjson-xs-perl/libjson-xs-perl_4.030-2build3_amd64.deb HTTP/1.1" 200 83574 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:25 +0000] "GET /ubuntu/pool/main/l/llvm-toolchain-17/libllvm17t64_17.0.6-9ubuntu1_amd64.deb HTTP/1.1" 200 26162724 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:25 +0000] "GET /ubuntu/pool/main/p/postgresql-16/libpq5_16.6-0ubuntu0.24.04.1_amd64.deb HTTP/1.1" 200 141282 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:25 +0000] "GET /ubuntu/pool/main/p/postgresql-16/postgresql-client-16_16.6-0ubuntu0.24.04.1_amd64.deb HTTP/1.1" 200 1270788 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:25 +0000] "GET /ubuntu/pool/main/p/postgresql-16/postgresql-16_16.6-0ubuntu0.24.04.1_amd64.deb HTTP/1.1" 200 15526230 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
10.169.240.224 - - [13/Dec/2024:16:46:25 +0000] "GET /ubuntu/pool/main/p/postgresql-common/postgresql_16%2b257build1.1_all.deb HTTP/1.1" 200 11586 "-" "Debian APT-HTTP/1.3 (2.7.14) non-interactive"
Benchmarks
I've recorded some modest benchmarks. The local mirror is the one installed on my host, the remote mirror is the Ubuntu Archive.
To obtain this data, I executed the following script which sets up a container, waits for it to be ready, then executes the three commands of interest sequentially, timing the them. There are a number of processes running on my host, so the CPU time available will fluctuate and cause this benchmark to be pretty rudimentary.
#!/bin/bash
declare -r Image='ubuntu:noble'
declare -ar ContainerProperties=(
--no-profiles
--storage zfs-lxd
--network=lxdbr0
)
declare -r ContainerNameLocal='package-mirror-local'
declare -r ContainerNameRemote='package-mirror-remote'
declare -r LogNameLocal="/tmp/mirror-local-$(date -Is).log"
declare -r LogNameRemote="/tmp/mirror-remote-$(date -Is).log"
declare -ra PackagesToDownload=(
build-essential
cmake
nginx
pkg-config
postgresql
)
mkdir -p benchmarks/tmp
lxc launch "${Image}" "${ContainerProperties[@]}" --config=user.user-data="$(cat user-data.yaml)" "${ContainerNameLocal}"
lxc exec "${ContainerNameLocal}" -- cloud-init status --wait
lxc exec "${ContainerNameLocal}" -- /usr/bin/time --append --output "${LogNameLocal}" -p apt update -y
lxc exec "${ContainerNameLocal}" -- /usr/bin/time --append --output "${LogNameLocal}" -p apt upgrade -y
lxc exec "${ContainerNameLocal}" -- /usr/bin/time --append --output "${LogNameLocal}" -p apt install -y "${PackagesToDownload[@]}" --download-only
lxc file pull "${ContainerNameLocal}""${LogNameLocal}" "benchmarks/${LogNameLocal}"
lxc delete "${ContainerNameLocal}" --force
lxc launch "${Image}" "${ContainerProperties[@]}" --config=user.user-data="$(cat user-data.yaml)" "${ContainerNameRemote}"
lxc exec "${ContainerNameRemote}" -- cloud-init status --wait
lxc exec "${ContainerNameRemote}" -- /usr/bin/time --append --output "${LogNameRemote}" -p apt update -y
lxc exec "${ContainerNameRemote}" -- /usr/bin/time --append --output "${LogNameRemote}" -p apt upgrade -y
lxc exec "${ContainerNameRemote}" -- /usr/bin/time --append --output "${LogNameRemote}" -p apt install -y "${PackagesToDownload[@]}" --download-only
lxc file pull "${ContainerNameRemote}""${LogNameRemote}" "benchmarks/${LogNameRemote}"
lxc delete "${ContainerNameRemote}" --force
The packages to download total approximately 136MB, which I think is a good volume to test on. I began the tests at 21:55 UTC on a Friday evening, performing ten iterations of the test, each iteration being spearated by one minute as a courtesy measure.
The results show the cumulative time spent for each step of the test. That is to say, all of the update times were accumulated, all of the upgrade times were accumulated, all of the installation download times were accumulated.
Phase | Time (seconds) | ||
---|---|---|---|
local | remote | local:remote (%) | |
update | 163.91 | 194.05 | 84.47% |
upgrade | 249.88 | 284.29 | 87.90% |
install --download-only | 22.66 | 162.9 | 13.91% |
Total Result | 436.45 | 641.24 | 68.06% |
The time improvements for the update and upgrade (30 packages) were modest however I am certain that these steps included consequential client-side processing times. The "installation" step, which only downloaded the packages, yielded a significant time improvement. We can see this step took less than 14% of the time it took to download from the Ubuntu Package Archive.
Going Forward
Preventing Syncing During An Archive Update
When the Ubuntu Package Archive is undergoing an update, a file is generated to warn users. This can be used by clients to prevent mirror corruption. My solution shouldabe altered to check whether the file exists and take that into account.
Deletion
Files on my mirror which have been removed from the Ubuntu Package Archive must be deleted.
The rsync
command I wrote deletes such files, but it would probably be better to delete them after the sync.
Preventing Corruption
If a sync should fail, I could be left with an inoperable mirror.
This will not do.
It would be ideal if I could retain a copy of my mirror until the sync has finished, then serve the new one if the sync succeeds.
I have used the --link-dest
feature of rsync
to accomplish that!
The procedure is:
- rename the existing root from
ubuntu
toubuntu-initial
- create a symbolic link called
ubuntu
which points toubuntu-initial
- rename the scripted sync destination to
ubuntu-now
where "now" is the current timestamp. - add
--link-dest=/path/to/my/mirrors/Ubuntu/ubuntu
to the script rsync options - after the sync finishes, delete the symlink and create a new one pointing the the new sync directory
This is post-sync result:
lrwxrwxrwx 1 daniel daniel 32 Dec 14 17:12 ubuntu -> ubuntu-2024-12-14T16:56:00+00:00 drwxrwxr-x 6 daniel daniel 4096 Dec 14 16:59 ubuntu-2024-12-14T16:56:00+00:00 drwxrwxr-x 6 daniel daniel 4096 Dec 14 17:12 ubuntu-initial
Wonderful! The symlink should only ever point to a successfully synced directory, and the data should never be reachable in a corrupt state. All that's left to do is reap the older syncs and to make it obvious when a sync failed. I'll reap the old syncs manually for now.
As for the failed syncs, I'll name the MirrorRoot a dot-name e.g. ".ubuntu-2024-12-14T17:36:41+00:00" and promote it to non-dot "ubuntu-2024-12-14T17:36:41+00:00" after the sync has successfully completed, then to sym-link to it.
Round-up
Now I have a functional package archive mirror with a script to manage updates.
It's working well so far. I intend to perform some more tests in the coming days but I envisage updating my host and the LXD cloud-config profiles to point to the archive very soon. If all goes well, I will no longer use apt-cacher-ng
for the Ubuntu package archive, but will keep it for other Debian repositories I may need. Docker for instance, doesn't facilitate rsync connections, therefore apt-cacher-ng
is a useful tool to have.
I hope you've enjoyed this brief introduction to the archive and how to make your own subset.
Go forth and have fun! :)
Resources