pulpcore.plugin.download¶
The module implements downloaders that solve many of the common problems plugin writers have while downloading remote data. A high level list of features provided by these downloaders include:
- auto-configuration from remote settings (auth, ssl, proxy)
- synchronous or parallel downloading
- digest and size validation computed during download
- grouping downloads together to return to the user when all files are downloaded
- customizable download behaviors via subclassing
All classes documented here should be imported directly from the
pulpcore.plugin.download
namespace.
Basic Downloading¶
The most basic downloading from a url can be done like this:
downloader = HttpDownloader('http://example.com/')
result = downloader.fetch()
The example above downloads the data synchronously. The
pulpcore.plugin.download.HttpDownloader.fetch
call blocks until the data is
downloaded and the pulpcore.plugin.download.DownloadResult
is returned or a fatal
exception is raised.
Parallel Downloading¶
Any downloader in the pulpcore.plugin.download
package can be run in parallel with the
asyncio
event loop. Each downloader has a
pulpcore.plugin.download.BaseDownloader.run
method which returns a coroutine object
that asyncio
can schedule in parallel. Consider this example:
download_coroutines = [
HttpDownloader('http://example.com/').run(),
HttpDownloader('http://pulpproject.org/').run(),
]
loop = asyncio.get_event_loop()
done, not_done = loop.run_until_complete(asyncio.wait([download_coroutines]))
for task in done:
try:
task.result() # This is a DownloadResult
except Exception as error:
pass # fatal exceptions are raised by result()
Download Results¶
The download result contains all the information about a completed download and is returned from a
the downloader's run()
method when the download is complete.
pulpcore.plugin.download.DownloadResult = namedtuple('DownloadResult', ['url', 'artifact_attributes', 'path', 'headers'])
module-attribute
¶
Parameters:
-
url
(str
) –The url corresponding with the download.
-
path
(str
) –The absolute path to the saved file
-
artifact_attributes
(dict
) –Contains keys corresponding with pulpcore.plugin.models.Artifact fields. This includes the computed digest values along with size information.
-
headers
(MultiDict
) –HTTP response headers. The keys are header names. The values are header content. None when not using the HttpDownloader or sublclass.
Configuring from a Remote¶
When fetching content during a sync, the remote has settings like SSL certs, SSL validation, basic
auth credentials, and proxy settings. Downloaders commonly want to use these settings while
downloading. The Remote's settings can automatically configure a downloader either to download a
url
or a pulpcore.plugin.models.RemoteArtifact
using the
pulpcore.plugin.models.Remote.get_downloader
call. Here is an example download from a URL:
downloader = my_remote.get_downloader(url='http://example.com')
downloader.fetch() # This downloader is configured with the remote's settings
Here is an example of a download configured from a RemoteArtifact, which also configures the downloader with digest and size validation:
remote_artifact = RemoteArtifact.objects.get(...)
downloader = my_remote.get_downloader(remote_artifact=ra)
downloader.fetch() # This downloader has the remote's settings and digest+validation checking
The pulpcore.plugin.models.Remote.get_downloader
internally calls the
DownloaderFactory
, so it expects a url
that the DownloaderFactory
can build a downloader for.
See the pulpcore.plugin.download.DownloaderFactory
for more information on
supported urls.
Tip
The pulpcore.plugin.models.Remote.get_downloader
accepts kwargs that can
enable size or digest based validation, and specifying a file-like object for the data to be
written into. See pulpcore.plugin.models.Remote.get_downloader
for more
information.
Note
All pulpcore.plugin.download.HttpDownloader
downloaders produced by the same
remote instance share an aiohttp
session, which provides a connection pool, connection
reusage and keep-alives shared across all downloaders produced by a single remote.
Automatic Retry¶
The pulpcore.plugin.download.HttpDownloader
will automatically retry 10 times if the
server responds with one of the following error codes:
- 429 - Too Many Requests
Exception Handling¶
Unrecoverable errors of several types can be raised during downloading. One example is a
validation exception <validation-exceptions>
that is raised if the content downloaded fails
size or digest validation. There can also be protocol specific errors such as an
aiohttp.ClientResponse
being raised when a server responds with a 400+ response such as an HTTP
403.
Plugin writers can choose to halt the entire task by allowing the exception be uncaught which would mark the entire task as failed.
Note
The pulpcore.plugin.download.HttpDownloader
automatically retry in some cases, but if
unsuccessful will raise an exception for any HTTP response code that is 400 or greater.
Custom Download Behavior¶
Custom download behavior is provided by subclassing a downloader and providing a new run()
method.
For example you could catch a specific error code like a 404 and try another mirror if your
downloader knew of several mirrors. Here is an example of that in
code.
A custom downloader can be given as the downloader to use for a given protocol using the
downloader_overrides
on the pulpcore.plugin.download.DownloaderFactory
.
Additionally, you can implement the pulpcore.plugin.models.Remote.get_downloader
method to specify the downloader_overrides
to the
pulpcore.plugin.download.DownloaderFactory
.
Adding New Protocol Support¶
To create a new protocol downloader implement a subclass of the
pulpcore.plugin.download.BaseDownloader
. See the docs on
pulpcore.plugin.download.BaseDownloader
for more information on the requirements.
Download Factory¶
The DownloaderFactory constructs and configures a downloader for any given url. Specifically:
- Select the appropriate downloader based from these supported schemes:
http
,https
orfile
. - Auto-configure the selected downloader with settings from a remote including (auth, ssl, proxy).
The pulpcore.plugin.download.DownloaderFactory.build
method constructs one
downloader for any given url.
Note
Any HttpDownloader <http-downloader>
objects produced by an instantiated
DownloaderFactory
share an aiohttp
session, which provides a connection pool, connection
reusage and keep-alives shared across all downloaders produced by a single factory.
Tip
The pulpcore.plugin.download.DownloaderFactory.build
method accepts kwargs that
enable size or digest based validation or the specification of a file-like object for the data
to be written into. See pulpcore.plugin.download.DownloaderFactory.build
for
more information.
pulpcore.plugin.download.DownloaderFactory(remote, downloader_overrides=None)
¶
A factory for creating downloader objects that are configured from with remote settings.
The DownloadFactory correctly handles SSL settings, basic auth settings, proxy settings, and connection limit settings.
It supports handling urls with the http
, https
, and file
protocols. The
downloader_overrides
option allows the caller to specify the download class to be used for
any given protocol. This allows the user to specify custom, subclassed downloaders to be built
by the factory.
Usage::
the_factory = DownloaderFactory(remote)
downloader = the_factory.build(url_a)
result = downloader.fetch() # 'result' is a DownloadResult
For http and https urls, in addition to the remote settings, non-default timing values are used. Specifically, the "total" timeout is set to None and the "sock_connect" and "sock_read" are both 5 minutes. For more info on these settings, see the aiohttp docs: http://aiohttp.readthedocs.io/en/stable/client_quickstart.html#timeouts Behaviorally, it should allow for an active download to be arbitrarily long, while still detecting dead or closed sessions even when TCPKeepAlive is disabled.
Parameters:
-
downloader_overrides
(dict
, default:None
) –Keyed on a scheme name, e.g. 'https' or 'ftp' and the value is the downloader class to be used for that scheme, e.g. {'https': MyCustomDownloader}. These override the default values.
user_agent()
staticmethod
¶
Produce a User-Agent string to identify Pulp and relevant system info.
build(url, **kwargs)
¶
Build a downloader which can optionally verify integrity using either digest or size.
The built downloader also provides concurrency restriction if specified by the remote.
Parameters:
-
url
(str
) –The download URL.
-
kwargs
(dict
, default:{}
) –All kwargs are passed along to the downloader. At a minimum, these include the pulpcore.plugin.download.BaseDownloader parameters.
Returns:
-
–
subclass of pulpcore.plugin.download.BaseDownloader: A downloader that
-
–
is configured with the remote settings.
HttpDownloader¶
This downloader is an asyncio-aware parallel downloader which is the default downloader produced by
the downloader-factory
for urls starting with http://
or https://
. It also supports
synchronous downloading using pulpcore.plugin.download.HttpDownloader.fetch
.
pulpcore.plugin.download.HttpDownloader(url, session=None, auth=None, proxy=None, proxy_auth=None, headers_ready_callback=None, headers=None, throttler=None, max_retries=0, **kwargs)
¶
Bases: BaseDownloader
An HTTP/HTTPS Downloader built on aiohttp
.
This downloader downloads data from one url
and is not reused.
The downloader optionally takes a session argument, which is an aiohttp.ClientSession
. This
allows many downloaders to share one aiohttp.ClientSession
which provides a connection pool,
connection reuse, and keep-alives across multiple downloaders. When creating many downloaders,
have one session shared by all of your HttpDownloader
objects.
A session is optional; if omitted, one session will be created, used for this downloader, and then closed when the download is complete. A session that is passed in will not be closed when the download is complete.
If a session is not provided, the one created by HttpDownloader uses non-default timing values. Specifically, the "total" timeout is set to None and the "sock_connect" and "sock_read" are both 5 minutes. For more info on these settings, see the aiohttp docs: http://aiohttp.readthedocs.io/en/stable/client_quickstart.html#timeouts Behaviorally, it should allow for an active download to be arbitrarily long, while still detecting dead or closed sessions even when TCPKeepAlive is disabled.
aiohttp.ClientSession
objects allows you to configure options that will apply to all
downloaders using that session such as auth, timeouts, headers, etc. For more info on these
options see the aiohttp.ClientSession
docs for more information:
http://aiohttp.readthedocs.io/en/stable/client_reference.html#aiohttp.ClientSession
The aiohttp.ClientSession
can additionally be configured for SSL configuration by passing in a
aiohttp.TCPConnector
. For information on configuring either server or client certificate based
identity verification, see the aiohttp documentation:
http://aiohttp.readthedocs.io/en/stable/client.html#ssl-control-for-tcp-sockets
For more information on aiohttp.BasicAuth
objects, see their docs:
http://aiohttp.readthedocs.io/en/stable/client_reference.html#aiohttp.BasicAuth
Synchronous Download::
downloader = HttpDownloader('http://example.com/')
result = downloader.fetch()
Parallel Download::
download_coroutines = [
HttpDownloader('http://example.com/').run(),
HttpDownloader('http://pulpproject.org/').run(),
]
loop = asyncio.get_event_loop()
done, not_done = loop.run_until_complete(asyncio.wait(download_coroutines))
for task in done:
try:
task.result() # This is a DownloadResult
except Exception as error:
pass # fatal exceptions are raised by result()
The HTTPDownloaders contain automatic retry logic if the server responds with HTTP 429 response. The coroutine will automatically retry 10 times with exponential backoff before allowing a final exception to be raised.
Attributes:
-
session
(ClientSession
) –The session to be used by the downloader.
-
auth
(BasicAuth
) –An object that represents HTTP Basic Authorization or None
-
proxy
(str
) –An optional proxy URL or None
-
proxy_auth
(BasicAuth
) –An optional object that represents proxy HTTP Basic Authorization or None
-
headers_ready_callback
(callable
) –An optional callback that accepts a single dictionary as its argument. The callback will be called when the response headers are available. The dictionary passed has the header names as the keys and header values as its values. e.g.
{'Transfer-Encoding': 'chunked'}
. This can also be None.
This downloader also has all of the attributes of pulpcore.plugin.download.BaseDownloader
Parameters:
-
url
(str
) –The url to download.
-
session
(ClientSession
, default:None
) –The session to be used by the downloader. (optional) If not specified it will open the session and close it
-
auth
(BasicAuth
, default:None
) –An object that represents HTTP Basic Authorization (optional)
-
proxy
(str
, default:None
) –An optional proxy URL.
-
proxy_auth
(BasicAuth
, default:None
) –An optional object that represents proxy HTTP Basic Authorization.
-
headers_ready_callback
(callable
, default:None
) –An optional callback that accepts a single dictionary as its argument. The callback will be called when the response headers are available. The dictionary passed has the header names as the keys and header values as its values. e.g.
{'Transfer-Encoding': 'chunked'}
-
headers
(dict
, default:None
) –Headers to be submitted with the request.
-
throttler
(Throttler
, default:None
) –Throttler for asyncio.
-
max_retries
(int
, default:0
) –The maximum number of times to retry a download upon failure.
-
kwargs
(dict
, default:{}
) –This accepts the parameters of pulpcore.plugin.download.BaseDownloader.
artifact_attributes
property
¶
A property that returns a dictionary with size and digest information. The keys of this dictionary correspond with pulpcore.plugin.models.Artifact fields.
handle_data(data)
async
¶
A coroutine that writes data to the file object and compute its digests.
All subclassed downloaders are expected to pass all data downloaded to this method. Similar to the hashlib docstring, repeated calls are equivalent to a single call with the concatenation of all the arguments: m.handle_data(a); m.handle_data(b) is equivalent to m.handle_data(a+b).
Parameters:
-
data
(bytes
) –The data to be handled by the downloader.
finalize()
async
¶
A coroutine to flush downloaded data, close the file writer, and validate the data.
All subclasses are required to call this method after all data has been passed to
:meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
Raises:
-
[pulpcore.exceptions.DigestValidationError][]
–When any of the
expected_digest
values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
. -
[pulpcore.exceptions.SizeValidationError][]
–When the
expected_size
value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
fetch(extra_data=None)
¶
Run the download synchronously and return the DownloadResult
.
Returns:
- –
Raises:
-
Exception
–Any fatal exception emitted during downloading
validate_digests()
¶
Validate all digests validate if expected_digests
is set
Raises:
-
[pulpcore.exceptions.DigestValidationError][]
–When any of the
expected_digest
values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
validate_size()
¶
Validate the size if expected_size
is set
Raises:
-
[pulpcore.exceptions.SizeValidationError][]
–When the
expected_size
value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
raise_for_status(response)
¶
Raise error if aiohttp response status is >= 400 and not silenced.
Parameters:
-
response
(ClientResponse
) –The response to handle.
Raises:
-
ClientResponseError
–When the response status is >= 400.
run(extra_data=None)
async
¶
Run the downloader with concurrency restriction and optional retry logic.
This method acquires self.semaphore
before calling the actual download implementation
contained in _run()
. This ensures that the semaphore stays acquired even as the backoff
wrapper around _run()
, handles backoff-and-retry logic.
Parameters:
-
extra_data
(dict
, default:None
) –Extra data passed to the downloader: disable_retry_list: List of exceptions which should not be retried.
Returns:
-
–
pulpcore.plugin.download.DownloadResult from
_run()
.
FileDownloader¶
This downloader is an asyncio-aware parallel file reader which is the default downloader produced by
the downloader-factory
for urls starting with file://
.
pulpcore.plugin.download.FileDownloader(url, *args, **kwargs)
¶
Bases: BaseDownloader
A downloader for downloading files from the filesystem.
It provides digest and size validation along with computation of the digests needed to save the file as an Artifact. It writes a new file to the disk and the return path is included in the pulpcore.plugin.download.DownloadResult.
This downloader has all of the attributes of pulpcore.plugin.download.BaseDownloader
Download files from a url that starts with file://
Parameters:
-
url
(str
) –The url to the file. This is expected to begin with
file://
-
kwargs
(dict
, default:{}
) –This accepts the parameters of pulpcore.plugin.download.BaseDownloader.
Raises:
-
ValidationError
–When the url starts with
file://
, but is not a subfolder of a path in the ALLOWED_IMPORT_PATH setting.
artifact_attributes
property
¶
A property that returns a dictionary with size and digest information. The keys of this dictionary correspond with pulpcore.plugin.models.Artifact fields.
handle_data(data)
async
¶
A coroutine that writes data to the file object and compute its digests.
All subclassed downloaders are expected to pass all data downloaded to this method. Similar to the hashlib docstring, repeated calls are equivalent to a single call with the concatenation of all the arguments: m.handle_data(a); m.handle_data(b) is equivalent to m.handle_data(a+b).
Parameters:
-
data
(bytes
) –The data to be handled by the downloader.
finalize()
async
¶
A coroutine to flush downloaded data, close the file writer, and validate the data.
All subclasses are required to call this method after all data has been passed to
:meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
Raises:
-
[pulpcore.exceptions.DigestValidationError][]
–When any of the
expected_digest
values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
. -
[pulpcore.exceptions.SizeValidationError][]
–When the
expected_size
value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
fetch(extra_data=None)
¶
Run the download synchronously and return the DownloadResult
.
Returns:
- –
Raises:
-
Exception
–Any fatal exception emitted during downloading
validate_digests()
¶
Validate all digests validate if expected_digests
is set
Raises:
-
[pulpcore.exceptions.DigestValidationError][]
–When any of the
expected_digest
values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
validate_size()
¶
Validate the size if expected_size
is set
Raises:
-
[pulpcore.exceptions.SizeValidationError][]
–When the
expected_size
value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
run(extra_data=None)
async
¶
Run the downloader with concurrency restriction.
This method acquires self.semaphore
before calling the actual download implementation
contained in _run()
. This ensures that the semaphore stays acquired even as the backoff
decorator on _run()
, handles backoff-and-retry logic.
Parameters:
-
extra_data
(dict
, default:None
) –Extra data passed to the downloader.
Returns:
-
–
pulpcore.plugin.download.DownloadResult from
_run()
.
BaseDownloader¶
This is an abstract downloader that is meant for subclassing. All downloaders are expected to be descendants of BaseDownloader.
pulpcore.plugin.download.BaseDownloader(url, expected_digests=None, expected_size=None, semaphore=None, *args, **kwargs)
¶
The base class of all downloaders, providing digest calculation, validation, and file handling.
This is an abstract class and is meant to be subclassed. Subclasses are required to implement
the :meth:~pulpcore.plugin.download.BaseDownloader.run
method and do two things:
1. Pass all downloaded data to
:meth:`~pulpcore.plugin.download.BaseDownloader.handle_data` and schedule it.
2. Schedule :meth:`~pulpcore.plugin.download.BaseDownloader.finalize` after all data has
been delivered to :meth:`~pulpcore.plugin.download.BaseDownloader.handle_data`.
Passing all downloaded data the into
:meth:~pulpcore.plugin.download.BaseDownloader.handle_data
allows the file digests to
be computed while data is written to disk. The digests computed are required if the download is
to be saved as an pulpcore.plugin.models.Artifact which avoids having to re-read the
data later.
The :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
method by default
writes to a random file in the current working directory.
The call to :meth:~pulpcore.plugin.download.BaseDownloader.finalize
ensures that all
data written to the file-like object is quiesced to disk before the file-like object has
close()
called on it.
Attributes:
-
url
(str
) –The url to download.
-
expected_digests
(dict
) –Keyed on the algorithm name provided by hashlib and stores the value of the expected digest. e.g. {'md5': '912ec803b2ce49e4a541068d495ab570'}
-
expected_size
(int
) –The number of bytes the download is expected to have.
-
path
(str
) –The full path to the file containing the downloaded data.
Create a BaseDownloader object. This is expected to be called by all subclasses.
Parameters:
-
url
(str
) –The url to download.
-
expected_digests
(dict
, default:None
) –Keyed on the algorithm name provided by hashlib and stores the value of the expected digest. e.g. {'md5': '912ec803b2ce49e4a541068d495ab570'}
-
expected_size
(int
, default:None
) –The number of bytes the download is expected to have.
-
semaphore
(Semaphore
, default:None
) –A semaphore the downloader must acquire before running. Useful for limiting the number of outstanding downloaders in various ways.
artifact_attributes
property
¶
A property that returns a dictionary with size and digest information. The keys of this dictionary correspond with pulpcore.plugin.models.Artifact fields.
handle_data(data)
async
¶
A coroutine that writes data to the file object and compute its digests.
All subclassed downloaders are expected to pass all data downloaded to this method. Similar to the hashlib docstring, repeated calls are equivalent to a single call with the concatenation of all the arguments: m.handle_data(a); m.handle_data(b) is equivalent to m.handle_data(a+b).
Parameters:
-
data
(bytes
) –The data to be handled by the downloader.
finalize()
async
¶
A coroutine to flush downloaded data, close the file writer, and validate the data.
All subclasses are required to call this method after all data has been passed to
:meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
Raises:
-
[pulpcore.exceptions.DigestValidationError][]
–When any of the
expected_digest
values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
. -
[pulpcore.exceptions.SizeValidationError][]
–When the
expected_size
value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
fetch(extra_data=None)
¶
Run the download synchronously and return the DownloadResult
.
Returns:
- –
Raises:
-
Exception
–Any fatal exception emitted during downloading
validate_digests()
¶
Validate all digests validate if expected_digests
is set
Raises:
-
[pulpcore.exceptions.DigestValidationError][]
–When any of the
expected_digest
values don't match the digest of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
validate_size()
¶
Validate the size if expected_size
is set
Raises:
-
[pulpcore.exceptions.SizeValidationError][]
–When the
expected_size
value doesn't match the size of the data passed to :meth:~pulpcore.plugin.download.BaseDownloader.handle_data
.
run(extra_data=None)
async
¶
Run the downloader with concurrency restriction.
This method acquires self.semaphore
before calling the actual download implementation
contained in _run()
. This ensures that the semaphore stays acquired even as the backoff
decorator on _run()
, handles backoff-and-retry logic.
Parameters:
-
extra_data
(dict
, default:None
) –Extra data passed to the downloader.
Returns:
-
–
pulpcore.plugin.download.DownloadResult from
_run()
.