Skip to content

pulpcore.plugin.download

The module implements downloaders that solve many of the common problems plugin writers have while downloading remote data. A high level list of features provided by these downloaders include:

  • auto-configuration from remote settings (auth, ssl, proxy)
  • synchronous or parallel downloading
  • digest and size validation computed during download
  • grouping downloads together to return to the user when all files are downloaded
  • customizable download behaviors via subclassing

All classes documented here should be imported directly from the pulpcore.plugin.download namespace.

Basic Downloading

The most basic downloading from a url can be done like this:

downloader = HttpDownloader('http://example.com/')
result = downloader.fetch()

The example above downloads the data synchronously. The pulpcore.plugin.download.HttpDownloader.fetch call blocks until the data is downloaded and the pulpcore.plugin.download.DownloadResult is returned or a fatal exception is raised.

Parallel Downloading

Any downloader in the pulpcore.plugin.download package can be run in parallel with the asyncio event loop. Each downloader has a pulpcore.plugin.download.BaseDownloader.run method which returns a coroutine object that asyncio can schedule in parallel. Consider this example:

download_coroutines = [
    HttpDownloader('http://example.com/').run(),
    HttpDownloader('http://pulpproject.org/').run(),
]

loop = asyncio.get_event_loop()
done, not_done = loop.run_until_complete(asyncio.wait([download_coroutines]))

for task in done:
    try:
        task.result()  # This is a DownloadResult
    except Exception as error:
        pass  # fatal exceptions are raised by result()

Download Results

The download result contains all the information about a completed download and is returned from a the downloader's run() method when the download is complete.

pulpcore.plugin.download.DownloadResult = namedtuple('DownloadResult', ['url', 'artifact_attributes', 'path', 'headers']) module-attribute

Parameters:

  • url (str) –

    The url corresponding with the download.

  • path (str) –

    The absolute path to the saved file

  • artifact_attributes (dict) –

    Contains keys corresponding with pulpcore.plugin.models.Artifact fields. This includes the computed digest values along with size information.

  • headers (MultiDict) –

    HTTP response headers. The keys are header names. The values are header content. None when not using the HttpDownloader or sublclass.

Configuring from a Remote

When fetching content during a sync, the remote has settings like SSL certs, SSL validation, basic auth credentials, and proxy settings. Downloaders commonly want to use these settings while downloading. The Remote's settings can automatically configure a downloader either to download a url or a pulpcore.plugin.models.RemoteArtifact using the pulpcore.plugin.models.Remote.get_downloader call. Here is an example download from a URL:

downloader = my_remote.get_downloader(url='http://example.com')
downloader.fetch()  # This downloader is configured with the remote's settings

Here is an example of a download configured from a RemoteArtifact, which also configures the downloader with digest and size validation:

remote_artifact = RemoteArtifact.objects.get(...)
downloader = my_remote.get_downloader(remote_artifact=ra)
downloader.fetch()  # This downloader has the remote's settings and digest+validation checking

The pulpcore.plugin.models.Remote.get_downloader internally calls the DownloaderFactory, so it expects a url that the DownloaderFactory can build a downloader for. See the pulpcore.plugin.download.DownloaderFactory for more information on supported urls.

Tip

The pulpcore.plugin.models.Remote.get_downloader accepts kwargs that can enable size or digest based validation, and specifying a file-like object for the data to be written into. See pulpcore.plugin.models.Remote.get_downloader for more information.

Note

All pulpcore.plugin.download.HttpDownloader downloaders produced by the same remote instance share an aiohttp session, which provides a connection pool, connection reusage and keep-alives shared across all downloaders produced by a single remote.

Automatic Retry

The pulpcore.plugin.download.HttpDownloader will automatically retry 10 times if the server responds with one of the following error codes:

  • 429 - Too Many Requests

Exception Handling

Unrecoverable errors of several types can be raised during downloading. One example is a validation exception <validation-exceptions> that is raised if the content downloaded fails size or digest validation. There can also be protocol specific errors such as an aiohttp.ClientResponse being raised when a server responds with a 400+ response such as an HTTP 403.

Plugin writers can choose to halt the entire task by allowing the exception be uncaught which would mark the entire task as failed.

Note

The pulpcore.plugin.download.HttpDownloader automatically retry in some cases, but if unsuccessful will raise an exception for any HTTP response code that is 400 or greater.

Custom Download Behavior

Custom download behavior is provided by subclassing a downloader and providing a new run() method. For example you could catch a specific error code like a 404 and try another mirror if your downloader knew of several mirrors. Here is an example of that in code.

A custom downloader can be given as the downloader to use for a given protocol using the downloader_overrides on the pulpcore.plugin.download.DownloaderFactory. Additionally, you can implement the pulpcore.plugin.models.Remote.get_downloader method to specify the downloader_overrides to the pulpcore.plugin.download.DownloaderFactory.

Adding New Protocol Support

To create a new protocol downloader implement a subclass of the pulpcore.plugin.download.BaseDownloader. See the docs on pulpcore.plugin.download.BaseDownloader for more information on the requirements.

Download Factory

The DownloaderFactory constructs and configures a downloader for any given url. Specifically:

  1. Select the appropriate downloader based from these supported schemes: http, https or file.
  2. Auto-configure the selected downloader with settings from a remote including (auth, ssl, proxy).

The pulpcore.plugin.download.DownloaderFactory.build method constructs one downloader for any given url.

Note

Any HttpDownloader <http-downloader> objects produced by an instantiated DownloaderFactory share an aiohttp session, which provides a connection pool, connection reusage and keep-alives shared across all downloaders produced by a single factory.

Tip

The pulpcore.plugin.download.DownloaderFactory.build method accepts kwargs that enable size or digest based validation or the specification of a file-like object for the data to be written into. See pulpcore.plugin.download.DownloaderFactory.build for more information.

pulpcore.plugin.download.DownloaderFactory(remote, downloader_overrides=None)

A factory for creating downloader objects that are configured from with remote settings.

The DownloadFactory correctly handles SSL settings, basic auth settings, proxy settings, and connection limit settings.

It supports handling urls with the http, https, and file protocols. The downloader_overrides option allows the caller to specify the download class to be used for any given protocol. This allows the user to specify custom, subclassed downloaders to be built by the factory.

Usage::

the_factory = DownloaderFactory(remote)
downloader = the_factory.build(url_a)
result = downloader.fetch()  # 'result' is a DownloadResult

For http and https urls, in addition to the remote settings, non-default timing values are used. Specifically, the "total" timeout is set to None and the "sock_connect" and "sock_read" are both 5 minutes. For more info on these settings, see the aiohttp docs: http://aiohttp.readthedocs.io/en/stable/client_quickstart.html#timeouts Behaviorally, it should allow for an active download to be arbitrarily long, while still detecting dead or closed sessions even when TCPKeepAlive is disabled.

Parameters:

  • downloader_overrides (dict, default: None ) –

    Keyed on a scheme name, e.g. 'https' or 'ftp' and the value is the downloader class to be used for that scheme, e.g. {'https': MyCustomDownloader}. These override the default values.

user_agent() staticmethod

Produce a User-Agent string to identify Pulp and relevant system info.

build(url, **kwargs)

Build a downloader which can optionally verify integrity using either digest or size.

The built downloader also provides concurrency restriction if specified by the remote.

Parameters:

  • url (str) –

    The download URL.

  • kwargs (dict, default: {} ) –

    All kwargs are passed along to the downloader. At a minimum, these include the pulpcore.plugin.download.BaseDownloader parameters.

Returns:

HttpDownloader

This downloader is an asyncio-aware parallel downloader which is the default downloader produced by the downloader-factory for urls starting with http:// or https://. It also supports synchronous downloading using pulpcore.plugin.download.HttpDownloader.fetch.

pulpcore.plugin.download.HttpDownloader(url, session=None, auth=None, proxy=None, proxy_auth=None, headers_ready_callback=None, headers=None, throttler=None, max_retries=0, **kwargs)

Bases: BaseDownloader

An HTTP/HTTPS Downloader built on aiohttp.

This downloader downloads data from one url and is not reused.

The downloader optionally takes a session argument, which is an aiohttp.ClientSession. This allows many downloaders to share one aiohttp.ClientSession which provides a connection pool, connection reuse, and keep-alives across multiple downloaders. When creating many downloaders, have one session shared by all of your HttpDownloader objects.

A session is optional; if omitted, one session will be created, used for this downloader, and then closed when the download is complete. A session that is passed in will not be closed when the download is complete.

If a session is not provided, the one created by HttpDownloader uses non-default timing values. Specifically, the "total" timeout is set to None and the "sock_connect" and "sock_read" are both 5 minutes. For more info on these settings, see the aiohttp docs: http://aiohttp.readthedocs.io/en/stable/client_quickstart.html#timeouts Behaviorally, it should allow for an active download to be arbitrarily long, while still detecting dead or closed sessions even when TCPKeepAlive is disabled.

aiohttp.ClientSession objects allows you to configure options that will apply to all downloaders using that session such as auth, timeouts, headers, etc. For more info on these options see the aiohttp.ClientSession docs for more information: http://aiohttp.readthedocs.io/en/stable/client_reference.html#aiohttp.ClientSession

The aiohttp.ClientSession can additionally be configured for SSL configuration by passing in a aiohttp.TCPConnector. For information on configuring either server or client certificate based identity verification, see the aiohttp documentation: http://aiohttp.readthedocs.io/en/stable/client.html#ssl-control-for-tcp-sockets

For more information on aiohttp.BasicAuth objects, see their docs: http://aiohttp.readthedocs.io/en/stable/client_reference.html#aiohttp.BasicAuth

Synchronous Download::

downloader = HttpDownloader('http://example.com/')
result = downloader.fetch()

Parallel Download::

download_coroutines = [
    HttpDownloader('http://example.com/').run(),
    HttpDownloader('http://pulpproject.org/').run(),
]

loop = asyncio.get_event_loop()
done, not_done = loop.run_until_complete(asyncio.wait(download_coroutines))

for task in done:
    try:
        task.result()  # This is a DownloadResult
    except Exception as error:
        pass  # fatal exceptions are raised by result()

The HTTPDownloaders contain automatic retry logic if the server responds with HTTP 429 response. The coroutine will automatically retry 10 times with exponential backoff before allowing a final exception to be raised.

Attributes:

  • session (ClientSession) –

    The session to be used by the downloader.

  • auth (BasicAuth) –

    An object that represents HTTP Basic Authorization or None

  • proxy (str) –

    An optional proxy URL or None

  • proxy_auth (BasicAuth) –

    An optional object that represents proxy HTTP Basic Authorization or None

  • headers_ready_callback (callable) –

    An optional callback that accepts a single dictionary as its argument. The callback will be called when the response headers are available. The dictionary passed has the header names as the keys and header values as its values. e.g. {'Transfer-Encoding': 'chunked'}. This can also be None.

This downloader also has all of the attributes of pulpcore.plugin.download.BaseDownloader

Parameters:

  • url (str) –

    The url to download.

  • session (ClientSession, default: None ) –

    The session to be used by the downloader. (optional) If not specified it will open the session and close it

  • auth (BasicAuth, default: None ) –

    An object that represents HTTP Basic Authorization (optional)

  • proxy (str, default: None ) –

    An optional proxy URL.

  • proxy_auth (BasicAuth, default: None ) –

    An optional object that represents proxy HTTP Basic Authorization.

  • headers_ready_callback (callable, default: None ) –

    An optional callback that accepts a single dictionary as its argument. The callback will be called when the response headers are available. The dictionary passed has the header names as the keys and header values as its values. e.g. {'Transfer-Encoding': 'chunked'}

  • headers (dict, default: None ) –

    Headers to be submitted with the request.

  • throttler (Throttler, default: None ) –

    Throttler for asyncio.

  • max_retries (int, default: 0 ) –

    The maximum number of times to retry a download upon failure.

  • kwargs (dict, default: {} ) –

    This accepts the parameters of pulpcore.plugin.download.BaseDownloader.

artifact_attributes property

A property that returns a dictionary with size and digest information. The keys of this dictionary correspond with pulpcore.plugin.models.Artifact fields.

handle_data(data) async

A coroutine that writes data to the file object and compute its digests.

All subclassed downloaders are expected to pass all data downloaded to this method. Similar to the hashlib docstring, repeated calls are equivalent to a single call with the concatenation of all the arguments: m.handle_data(a); m.handle_data(b) is equivalent to m.handle_data(a+b).

Parameters:

  • data (bytes) –

    The data to be handled by the downloader.

finalize() async

A coroutine to flush downloaded data, close the file writer, and validate the data.

All subclasses are required to call this method after all data has been passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

Raises:

  • [pulpcore.exceptions.DigestValidationError][]

    When any of the expected_digest values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

  • [pulpcore.exceptions.SizeValidationError][]

    When the expected_size value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

fetch(extra_data=None)

Run the download synchronously and return the DownloadResult.

Returns:

Raises:

  • Exception

    Any fatal exception emitted during downloading

validate_digests()

Validate all digests validate if expected_digests is set

Raises:

  • [pulpcore.exceptions.DigestValidationError][]

    When any of the expected_digest values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

validate_size()

Validate the size if expected_size is set

Raises:

  • [pulpcore.exceptions.SizeValidationError][]

    When the expected_size value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

raise_for_status(response)

Raise error if aiohttp response status is >= 400 and not silenced.

Parameters:

  • response (ClientResponse) –

    The response to handle.

Raises:

  • ClientResponseError

    When the response status is >= 400.

run(extra_data=None) async

Run the downloader with concurrency restriction and retry logic.

This method acquires self.semaphore before calling the actual download implementation contained in _run(). This ensures that the semaphore stays acquired even as the backoff wrapper around _run(), handles backoff-and-retry logic.

Parameters:

  • extra_data (dict, default: None ) –

    Extra data passed to the downloader.

Returns:

FileDownloader

This downloader is an asyncio-aware parallel file reader which is the default downloader produced by the downloader-factory for urls starting with file://.

pulpcore.plugin.download.FileDownloader(url, *args, **kwargs)

Bases: BaseDownloader

A downloader for downloading files from the filesystem.

It provides digest and size validation along with computation of the digests needed to save the file as an Artifact. It writes a new file to the disk and the return path is included in the pulpcore.plugin.download.DownloadResult.

This downloader has all of the attributes of pulpcore.plugin.download.BaseDownloader

Download files from a url that starts with file://

Parameters:

Raises:

  • ValidationError

    When the url starts with file://, but is not a subfolder of a path in the ALLOWED_IMPORT_PATH setting.

artifact_attributes property

A property that returns a dictionary with size and digest information. The keys of this dictionary correspond with pulpcore.plugin.models.Artifact fields.

handle_data(data) async

A coroutine that writes data to the file object and compute its digests.

All subclassed downloaders are expected to pass all data downloaded to this method. Similar to the hashlib docstring, repeated calls are equivalent to a single call with the concatenation of all the arguments: m.handle_data(a); m.handle_data(b) is equivalent to m.handle_data(a+b).

Parameters:

  • data (bytes) –

    The data to be handled by the downloader.

finalize() async

A coroutine to flush downloaded data, close the file writer, and validate the data.

All subclasses are required to call this method after all data has been passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

Raises:

  • [pulpcore.exceptions.DigestValidationError][]

    When any of the expected_digest values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

  • [pulpcore.exceptions.SizeValidationError][]

    When the expected_size value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

fetch(extra_data=None)

Run the download synchronously and return the DownloadResult.

Returns:

Raises:

  • Exception

    Any fatal exception emitted during downloading

validate_digests()

Validate all digests validate if expected_digests is set

Raises:

  • [pulpcore.exceptions.DigestValidationError][]

    When any of the expected_digest values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

validate_size()

Validate the size if expected_size is set

Raises:

  • [pulpcore.exceptions.SizeValidationError][]

    When the expected_size value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

run(extra_data=None) async

Run the downloader with concurrency restriction.

This method acquires self.semaphore before calling the actual download implementation contained in _run(). This ensures that the semaphore stays acquired even as the backoff decorator on _run(), handles backoff-and-retry logic.

Parameters:

  • extra_data (dict, default: None ) –

    Extra data passed to the downloader.

Returns:

BaseDownloader

This is an abstract downloader that is meant for subclassing. All downloaders are expected to be descendants of BaseDownloader.

pulpcore.plugin.download.BaseDownloader(url, expected_digests=None, expected_size=None, semaphore=None, *args, **kwargs)

The base class of all downloaders, providing digest calculation, validation, and file handling.

This is an abstract class and is meant to be subclassed. Subclasses are required to implement the :meth:~pulpcore.plugin.download.BaseDownloader.run method and do two things:

1. Pass all downloaded data to
   :meth:`~pulpcore.plugin.download.BaseDownloader.handle_data` and schedule it.

2. Schedule :meth:`~pulpcore.plugin.download.BaseDownloader.finalize` after all data has
   been delivered to :meth:`~pulpcore.plugin.download.BaseDownloader.handle_data`.

Passing all downloaded data the into :meth:~pulpcore.plugin.download.BaseDownloader.handle_data allows the file digests to be computed while data is written to disk. The digests computed are required if the download is to be saved as an pulpcore.plugin.models.Artifact which avoids having to re-read the data later.

The :meth:~pulpcore.plugin.download.BaseDownloader.handle_data method by default writes to a random file in the current working directory.

The call to :meth:~pulpcore.plugin.download.BaseDownloader.finalize ensures that all data written to the file-like object is quiesced to disk before the file-like object has close() called on it.

Attributes:

  • url (str) –

    The url to download.

  • expected_digests (dict) –

    Keyed on the algorithm name provided by hashlib and stores the value of the expected digest. e.g. {'md5': '912ec803b2ce49e4a541068d495ab570'}

  • expected_size (int) –

    The number of bytes the download is expected to have.

  • path (str) –

    The full path to the file containing the downloaded data.

Create a BaseDownloader object. This is expected to be called by all subclasses.

Parameters:

  • url (str) –

    The url to download.

  • expected_digests (dict, default: None ) –

    Keyed on the algorithm name provided by hashlib and stores the value of the expected digest. e.g. {'md5': '912ec803b2ce49e4a541068d495ab570'}

  • expected_size (int, default: None ) –

    The number of bytes the download is expected to have.

  • semaphore (Semaphore, default: None ) –

    A semaphore the downloader must acquire before running. Useful for limiting the number of outstanding downloaders in various ways.

artifact_attributes property

A property that returns a dictionary with size and digest information. The keys of this dictionary correspond with pulpcore.plugin.models.Artifact fields.

handle_data(data) async

A coroutine that writes data to the file object and compute its digests.

All subclassed downloaders are expected to pass all data downloaded to this method. Similar to the hashlib docstring, repeated calls are equivalent to a single call with the concatenation of all the arguments: m.handle_data(a); m.handle_data(b) is equivalent to m.handle_data(a+b).

Parameters:

  • data (bytes) –

    The data to be handled by the downloader.

finalize() async

A coroutine to flush downloaded data, close the file writer, and validate the data.

All subclasses are required to call this method after all data has been passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

Raises:

  • [pulpcore.exceptions.DigestValidationError][]

    When any of the expected_digest values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

  • [pulpcore.exceptions.SizeValidationError][]

    When the expected_size value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

fetch(extra_data=None)

Run the download synchronously and return the DownloadResult.

Returns:

Raises:

  • Exception

    Any fatal exception emitted during downloading

validate_digests()

Validate all digests validate if expected_digests is set

Raises:

  • [pulpcore.exceptions.DigestValidationError][]

    When any of the expected_digest values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

validate_size()

Validate the size if expected_size is set

Raises:

  • [pulpcore.exceptions.SizeValidationError][]

    When the expected_size value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data.

run(extra_data=None) async

Run the downloader with concurrency restriction.

This method acquires self.semaphore before calling the actual download implementation contained in _run(). This ensures that the semaphore stays acquired even as the backoff decorator on _run(), handles backoff-and-retry logic.

Parameters:

  • extra_data (dict, default: None ) –

    Extra data passed to the downloader.

Returns:

Validation Exceptions

pulpcore.exceptions.DigestValidationError(actual, expected, *args, url=None, **kwargs)

Bases: ValidationError

Raised when a file fails to validate a digest checksum.

pulpcore.exceptions.SizeValidationError(actual, expected, *args, url=None, **kwargs)

Bases: ValidationError

Raised when a file fails to validate a size checksum.

pulpcore.exceptions.ValidationError(error_code)

Bases: PulpException

A base class for all Validation Errors.

:param error_code: unique error code :type error_code: str