Repository Synchronization¶
Synchronization is one of two ways to obtain APT content for your Pulp instance.
Quickstart Example¶
A working example for synchronizing and hosting Debian bookworm content from nginx.org:
NAME='quickstart-nginx-bookworm-amd64'
REMOTE_OPTIONS=(
--url=http://nginx.org/packages/debian/
--distribution=bookworm
--component=nginx
--architecture=amd64
)
pulp deb remote create --name=${NAME} ${REMOTE_OPTIONS[@]}
pulp deb repository create --name=${NAME} --remote=${NAME}
pulp deb repository sync --name=${NAME}
pulp deb publication create --repository=${NAME}
pulp deb distribution create --name=${NAME} --base-path=${NAME} --repository=${NAME}
The final command above, will include the base_url
parameter in its output.
The accompanying value will tell you where the Pulp content app serves your newly created repository.
To re-sync, re-publish and re-distribute a newer version of the upstream repository:
pulp deb repository sync --name=${NAME}
pulp deb publication create --repository=${NAME}
The distribution is automatically updated since it was created using the --repository
flag.
This enables auto-distributing of the latest publication created from the repository.
To configure our example repo in the /etc/apt/sources.list
file on a consuming host:
deb http://<your_pulp_host>/pulp/content/quickstart-nginx-bookworm-amd64/ bookworm nginx
Variation 1: Maximum Flexibility¶
This second example will trade in some convenience for increased flexibility:
NAME='flexible-nginx-bookworm-amd64'
pulp deb repository create --name=${NAME}
pulp deb repository sync --name=${NAME} --remote=quickstart-nginx-bookworm-amd64
PUB_HREF=$(pulp deb publication create --repository=${NAME} | jq -r '.pulp_href')
pulp deb distribution create --name=${NAME} --base-path=${NAME} --publication=${PUB_HREF}
- The repository is created first, and not linked to any remote. As a result we can and must specify the remote for each sync, in this case re-using the remote from the previous example.
- Rather than linking our distribution to our repository to enable auto-distributing the publication is specified explicitly.
To do so, the publication href is stored in a variable parsed from the API response via
jq
at creation time. This is necessary since publications have no name.
The re-sync, workflow for this example is significantly more complicated:
pulp deb repository sync --name=${NAME} --remote=quickstart-nginx-bookworm-amd64
PUB_HREF=$(pulp deb publication create --repository=${NAME} | jq -r '.pulp_href')
pulp deb distribution update --name=${NAME} --publication=${PUB_HREF}
One advantage of updating the distribution manually like this, is increased control over the time when attached clients are served the new version.
For large repositories, the sync
and publication create
actions can take a long time to complete, while a distribution update is near instantaneous and could be scheduled to run at a precise time.
Important APT Remote Flags¶
The above examples are designed so that they can be modified for synchronizing arbitrary upstream repositories, simply by modifying the REMOTE_OPTIONS
.
A remote describes the sync options for some upstream repository.
As a result, we will now describe how to set some important sync flags:
--url
(required): The URL to the remote repository root. The repository root folder can normally be identified by the presence of adists/
and apool/
folder. For example, if you open http://ftp.de.debian.org/debian/ in a browser, you will see these folders there.--distribution
(required): The path between thedists/
folder, and someRelease
/InRelease
file that should be synchronized. For example, if you open http://ftp.de.debian.org/debian/dists/bullseye/ in a browser, you will find the release files there, so this distribution must be given asbullseye
. A single APT repository may host many different APT distributions, so the--distribution
flag may be specified multiple times.--component
: An APT repo component to sync. TheRelease
/InRelease
file of every APT repo distribution includes aComponents:
field with a list of valid components for that distribution. For example, if you check the file at http://ftp.de.debian.org/debian/dists/bullseye/InRelease, it includes the lineComponents: main contrib non-free
, somain
,contrib
, andnon-free
would all be valid values for the--component
flag. If you do not supply any components on a remote, then all that are available will be synchronized.--architecture
: A Debian machine architecture to sync. This flag works exactly like the--component
flag, with the only difference that the relevant field in theRelease
/InRelease
file is theArchitectures:
field. For example, if someRelease
file includes the lineArchitectures: all amd64 arm64 i386
, thenamd64
,arm64
,i386
are all good values for the--architecture
flag. A architecture value ofall
has special meaning, and never has to be specified on your remote.
Putting all of this together in a single example, we could create the following remote:
NAME='debian-bullseye-amd64'
REMOTE_OPTIONS=(
--url=http://ftp.de.debian.org/debian/
--distribution=bullseye
--component=main
--component=contrib
--architecture=amd64
--architecture=nonsense
)
pulp deb remote create --name=${NAME} ${REMOTE_OPTIONS[@]}
- By having specified the components
main
andcontrib
, we are excluding thenon-free
component from our sync. - By specifying the architecture
amd64
we are synchronizing all packages with architectureamd64
but also packages with architectureall
which are always synchronized. - By also specifying the non existent architecture
nonsense
we are not changing the sync result at all, since specifying architectures or components that do not exist for some distribution does not result in any errors (though it will log a warning).
You can list the full list of available remote creation options using pulp deb remote create --help
.
Best Practice Recommendations¶
We recommend sticking to the following best practice recommendations:
- Once you sync a remote into a repository don't modify what is synced to that repository. Keep using the same remote for that repository, and don't modify the distributions, components, or architectures parameters on the remote. If you do want to change these values it is almost always best to create a new remote, and sync it to a new Pulp repository.
- Use a single
--distribution
per remote. While it is possible to set multiple distributions on a single remote, and sync them into a single Pulp repository, this can easily lead to huge confusing reposiotries with performance issues. On the flip side it is cheap to create one remote and one repository for each distribution you want to sync. If you want to sync a lot of distributions, from the same upstream repository, this can easily be scripted. - For official Debian repositories, never use values like
stable
,oldstable
,oldoldstable-updates
, etc. for the distribution. Always use Debian distribution names likebookworm
orbookworm-updates
instead. The reason is thatstable
,oldstable
, oroldoldstable
are symlinks, that might suddenly be redirected to an entirely different APT repo distribution when a new Debian version is released. - Always consider explicitly setting any
--architecture
values you want. If you know you just needamd64
, syncing all the other architectures could cost you a multiple in sync times and storage requirements compared to just syncingamd64
. Unless you have hosts with multiarch environments, consider syncing just one architecture per Pulp repository (similarly to syncing just a single distribution). - For official Debian and Ubuntu repositories you normally want all the components, so it is ok not to set any components explicitly (this is interpreted as sync all that are available). However, some third party repositories sometimes host a large number of components that you may not need, so you should set just the components you need in such cases.
- Use a naming scheme for your remotes, that reflects the above.
For example
nginx-bookworm-amd64
is a good name using the structure<repo_name>-<distribution>-<architecture>
.
Flat Repository Format Example¶
pulp_deb
supports synchronization from repositories using the deprecated flat repository format.
Note
An APT repo using flat repository format does not have a dists/
folder.
Rather it is characterized by a single Release
and/or InRelease
file, with a single package index right next to it.
Most commonly all metadata files and all packages are stored directly in the repository root.
Hence, the name: "flat repo format".
The following workflow synchronizes an example flat APT repo:
NAME='nvidia-cuda-flat-amd64'
REMOTE_OPTIONS=(
--url=http://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/
--distribution=/
--architecture=amd64
)
pulp deb remote create --name=${NAME} ${REMOTE_OPTIONS[@]}
pulp deb repository create --name=${NAME} --remote=${NAME}
pulp deb repository sync --name=${NAME}
pulp deb publication create --repository=${NAME}
pulp deb distribution create --name=${NAME} --base-path=${NAME} --repository=${NAME}
- For a flat repository, the specified distribution must always end with a
/
, most commonly, it will just be a/
. Conversely, a distribution string provided for a repository not using flat repository format must not end with/
! - You must not provide more than one distribution for a flat repository.
- Since flat repositories do not contain components, there is no reason to use the
--component
flag. - You may still filter by architecture using the
--architecture
flag.
Warning
Even though you are synchronizing a flat repository, pulp_deb
will convert it to a regular structured APT repository on the publish.
A distribution of /
will be converted into a single distribution named flat-repo
, which will contain a single component named flat-repo-component
.
To configure the above repo in the /etc/apt/sources.list
file on a consuming host:
deb http://<your_pulp_host>/pulp/content/nvidia-cuda-flat-amd64/ flat-repo flat-repo-component
This contrasts with how you would configure the upstream flat repository:
deb http://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/ /
Debian Security Repository Example¶
The debian security repository up to the buster
release, uses a rare variation on the standard APT repository structure, where the distribution includes a /
in it.
It is possible to create a remote for this as follows:
NAME='debian-security-buster-amd64'
REMOTE_OPTIONS=(
--url=http://security.debian.org/debian-security/
--distribution=buster/updates
--component=updates/contrib
--component=non-free
--architecture=amd64
)
pulp deb remote create --name=${NAME} ${REMOTE_OPTIONS[@]}
Note
For the example distribution above, the Release file components are listed as:
Components: updates/main updates/contrib updates/non-free
You may specify a component of updates/main
as either updates/main
or simply as main
, pulp_deb
will understand either way.
The example is chosen to demonstrate both versions, as a matter of best practice we recommend being consistent.
Synchronizing from Partial Mirrors¶
By default, syncs will fail if the upstream repository is missing package indices that are present in its Release file.
This breaks synchronization from partial mirrors, and can be overriden by setting ignore_missing_package_indices=True
on the remote.
Alternatively, use FORCE_IGNORE_MISSING_PACKAGE_INDICES=True
in your Pulp configuration file, to force this behaviour for all syncs irrespective of the individual remotes.
Note
Currently, the remote option ignore_missing_package_indices
cannot be set using Pulp CLI.