Skip to content

Pulp Import/Export

The Pulp Import/Export process is based around the Django Import/Export library . To be 'exportable/importable', your plugin must define a modelresource module at <plugin>/app/modelresource.py. The module must contain a ModelResource subclass for each Model you want to expose, and it must define an IMPORT_ORDER ordered list for all such ModelResources.

QueryModelResource

If you don't need to do anything "special" to export your Model you can subclass pulpcore.plugin.importexport.QueryModelResource. This only requires you to provide the Meta.model class for the Model being export/imported, and to override the set_up_queryset(self) method to define a limiting filter. QueryModelResource is instantiated by the export process with the RepositoryVersion being exported (self.repo_version).

An example QueryModelResource subclasses, for import/exporting the Bar Model from pulp_foo, would look like this:

class BarResource(QueryModelResource):
    """
    Resource for import/export of foo_bar entities
    """

    def set_up_queryset(self):
        """
        :return: Bars specific to a specified repo-version.
        """
        return Bar.objects.filter(pk__in=self.repo_version.content)

    class Meta:
        model = Bar

BaseContentResource

The BaseContentResource class provides a base class for exporting Content. BaseContentResource provides extra functionality on top of QueryModelResource specific to handling the exporting and importing of Content such as handling of Content-specific fields like upstream_id.

An example of subclassing BaseContentResource looks like:

class MyContentResource(BaseContentResource):
    """
    Resource for import/export of MyContent.
    """

    def set_up_queryset(self):
        """
        :return: MyContent specific to a specified repo-version.
        """
        return MyContent.objects.filter(pk__in=self.repo_version.content)

    class Meta:
        model = MyContent

modelresource.py

A simple modelresource.py module is the one for the pulp_file plugin. It looks like this:

from pulpcore.plugin.importexport import BaseContentResource
from pulp_file.app.models import FileContent

class FileContentResource(BaseContentResource):
    """
    Resource for import/export of file_filecontent entities
    """

    def set_up_queryset(self):
        """
        :return: FileContents specific to a specified repo-version.
        """
        return FileContent.objects.filter(pk__in=self.repo_version.content)

    class Meta:
        model = FileContent


IMPORT_ORDER = [FileContentResource]

Plugin writers are encouraged to subclass the RepositoryResource class to enable automatic repository creation during the import. For the pulp_file plugin, the following implementation should be considered:

from pulpcore.plugin.modelresources import RepositoryResource
from pulp_file.app.models import FileRepository

class FileRepositoryResource(RepositoryResource):
    """
    A resource for importing/exporting file repository entities
    """

    def set_up_queryset(self):
        """
        :return: A queryset containing one repository that will be exported.
        """
        return FileRepository.objects.filter(pk=self.repo_version.repository)

    class Meta:
        model = FileRepository


# the list signifying the order of imports must also include the repository resource class
IMPORT_ORDER = [FileContentResource, FileRepositoryResource]

For performance reasons, it is important that care is taken when writing resource definitions. If your model has foreign keys that are exported as such (raw UUID key values), you should define a should a custom "dehydrate" method for that field to avoid an unnecessary lookup for each instance as seen in this issue. Else, if foreign keys are exported using some natural key of the referenced row, then the definition of set_up_queryset() should ensure those references are pre-selected using select_related(), otherwise an N+1 query scenario is likely.

content_mapping

By default, all the Content that gets imported is automatically associated with the Repository it is stored with inside the export archive. In some cases, this may not be desirable. One such case is when there is Content that is tied to a sub_repo but not directly to the Repository itself. Another case is where you may have Content you want imported but not associated with a Repositoy. In such cases, you can set a content_mapping property on the Resource.

The content_mapping property should be a dictionary that maps repository names to a list of content_ids. The importer code in pulp will combine the content_mappings across Resources and export them to a content_mapping.json file that it will use during import to map Content to Repositories.

Here is an example that deals with subrepos:

class MyContentResource(BaseContentResource):
    """
    Resource for import/export of MyContent.
    """

    def __init__(self, *args, **kwargs):
        """Override __init__ to set content_mapping to a dict."""
        self.content_mapping = {}
        super().__init__(*args, **kwargs)

    def set_up_queryset(self):
        """Set up the queryset and our content_mapping."""
        content = MyContent.objects.filter(pk__in=self.repo_version.content)
        self.content_mapping[self.repository_version.repository.name] = content

        for repo in self.subrepos(self.repo_version):
            subrepo_content = repo.latest_repository_version.content
            self.content_mapping[repo.name] = subrepo_content
            content |= subrepo_content

        return content

    class Meta:
        model = MyContent