Spotlight: Content Delivery Server Basics
One of the more recent features in Pulp is the ability to use the Pulp server to push repositories out to be hosted on external servers. These Content Delivery Servers (CDS) can be used to distribute content across firewalls and geographic locations, as well as providing an added layer of security by selectively deploying only a subset of the Pulp server’s content to a specific instance.
Installation and Configuration
A server is configured to run as a CDS by first installing the Pulp CDS package and its dependencies:
[root@localhost] yum install pulp-cds
If not already installed, httpd will be installed as part of the installation process. The virtual host for the CDS is also installed through the
The CDS will need to be able to resolve the hostname of the Pulp server in order to be able to download content from it. This will typically be done through DNS, but in development environments this often means needing to edit
/etc/hosts to add an entry for the Pulp server.
Additionally, the CDS will have to be configured to use the messaging broker on the Pulp server so it can be manipulated from the server itself. This is done through the
[server] host = pulp.example.com
Like the Pulp server itself, the CDS uses Apache to host its repositories. The typical steps to configure the Apache instance with an SSL certificate should be taken at this point.
Once the configuration changes have been made, the CDS processes are started through the init script:
[root@localhost ~] service pulp-cds start Starting goferd [ OK ] Starting httpd: [ OK ]
Once the CDS server is running, it must be registered to a Pulp server. A CDS may only be registered to one Pulp server at a given time. Once registered, the CDS will only accept commands from the Pulp server that it is currently registered to. When a CDS is unregistered from a Pulp server, it is once again open to be registered by a different Pulp server.
Registration is done through the
pulp-admin cds commands. The registration command requires the hostname of the CDS as identification. Keep in mind, however, that the Pulp server itself does not need to be able to resolve the CDS hostname to an IP address. Rather, the hostname is used to determine the unique message bus ID for that CDS. A display name can also be specified using the
[root@localhost ~] pulp-admin cds register --hostname cds.example.com Successfully registered CDS [cds.example.com]
Registered CDS instances can be displayed through
pulp-admin cds list command:
[root@localhost ~] pulp-admin cds list +------------------------------------------+ CDS Instances +------------------------------------------+ Name cds.example.com Hostname cds.example.com Description None Repos None Last Sync Never Status: Responding Yes Last Heartbeat 2011-05-12 17:07:21.834959+00:00
A registered CDS is neat, but not very useful. Repositories are associated with a CDS to customize the content they serve. Repositories that are protected on the Pulp server using repository authentication will also be protected on any CDS instance they are associated with (more on that in a future blog).
[root@localhost ~] pulp-admin cds associate_repo --hostname cds.example.com --repoid demo Successfully associated CDS [cds.example.com] with repo [demo]
Repository association configures the server’s knowledge of which repositories belong on which CDS instances. In order to get the bits to the CDS instance, it must be synchronized. The
pulp-admin cds sync command is used to trigger this process and the
pulp-admin cds status command is used to check its progress (in the future, this will be enhanced to more closely resemble the abilities for monitoring repo syncs).
[root@localhost ~] pulp-admin cds sync --hostname cds.example.com Sync for CDS [rhino.marvel.u] started Use "cds status" to check on the progress [root@localhost ~] pulp-admin cds status --hostname cds.example.com +------------------------------------------+ CDS Status +------------------------------------------+ Name cds.example.com Hostname cds.example.com Description None Repos demo Last Sync 2011-05-12 13:20:35-04:00 Status: Responding Yes Last Heartbeat 2011-05-12 17:23:54.328154+00:00
Repository contents are stored in the
/var/lib/pulp-cds directory on the CDS instance:
[root@localhost] ll -R /var/lib/pulp-cds/ /var/lib/pulp-cds/: total 8 -rw-r--r--. 1 root root 5 May 12 13:23 cds_repo_list drwxr-xr-x. 3 root root 4096 May 12 13:23 demo /var/lib/pulp-cds/demo: total 8 -rw-r--r--. 1 root root 2184 May 12 13:23 pulp-demo-1.0-1.fc14.x86_64.rpm drwxr-xr-x. 2 root root 4096 May 12 13:23 repodata /var/lib/pulp-cds/demo/repodata: total 28 -rw-r--r--. 1 root root 745 May 12 13:23 filelists.sqlite.bz2 -rw-r--r--. 1 root root 308 May 12 13:23 filelists.xml.gz -rw-r--r--. 1 root root 736 May 12 13:23 other.sqlite.bz2 -rw-r--r--. 1 root root 357 May 12 13:23 other.xml.gz -rw-r--r--. 1 root root 1721 May 12 13:23 primary.sqlite.bz2 -rw-r--r--. 1 root root 662 May 12 13:23 primary.xml.gz -rw-r--r--. 1 root root 2685 May 12 13:23 repomd.xml
Repositories on a CDS are hosted at the same URL as for the Pulp server itself:
[root@localhost] wget --no-check-certificate https://cds.example.com/pulp/repos/demo/pulp-demo-1.0-1.fc14.x86_64.rpm HTTP request sent, awaiting response... 200 OK Length: 2184 (2.1K) [text/plain] Saving to: “pulp-demo-1.0-1.fc14.x86_64.rpm” 100%[==============================================================>] 2,184 --.-K/s in 0s 2011-05-12 13:28:21 (24.3 MB/s) - “pulp-demo-1.0-1.fc14.x86_64.rpm” saved [2184/2184]
Binding and Distribution
When a consumer is bound to a repository, the Pulp server takes into account any CDS instances that host the repository in question. A list of all locations at which the repository is hosted is returned to the consumer and used as a mirror list for the repository. The Pulp server keeps track of these distributions and will organize these mirror lists to distribute consumer load across all hosts of a given repository. In the future, this distribution can be enhanced to take into account other factors than simple round-robin distribution and make the determination of the “best” CDS for a given repository for each consumer.
In the middle of writing this, my boss pinged me to tell me he set up a demo environment using a Pulp server in Boston and a CDS instance in Tel Aviv. Geographic issues are just one of the issues addressed by the CDS functionality. CDS instances can be configured with only a subset of repositories to better control which content is available to which clients.
The next big CDS related feature on the horizon is the introduction of CDS groups. The implications extend beyond the simple administration benefits of being able to affect more than a single CDS at a time. The intention is that CDS instances will be able to use other CDS instances in their same group for load balancing and fail over scenarios.
In the meantime, the basic story for Content Delivery Servers is implemented and available in Pulp. This story includes the ability to register CDS instances, associate and synchronize repositories on a per-CDS basis, and passing this information to consumers when they are bound to balance repository accesses across all possible sources.