Using with Hugging Face CLI¶

This guide explains how to use your Pulp Hugging Face cache with the official Hugging Face CLI and Python libraries.

Overview¶

The pulp_hugging_face plugin is designed to be compatible with Hugging Face tools. By setting the HF_ENDPOINT environment variable, you can redirect all Hugging Face client requests through your Pulp instance.

Environment Setup¶

Set the HF_ENDPOINT environment variable to point to your Pulp distribution:

Linux/macOSWindows (PowerShell)Windows (CMD)

export HF_ENDPOINT="http://your-pulp-instance/pulp/content/huggingface"

$env:HF_ENDPOINT = "http://your-pulp-instance/pulp/content/huggingface"

set HF_ENDPOINT=http://your-pulp-instance/pulp/content/huggingface

Persistent Configuration

Add the export command to your shell profile (~/.bashrc, ~/.zshrc) for persistence.

Using huggingface-cli¶

Download Models¶

Download ModelDownload Specific FilesDownload to Specific Location

# Set endpoint
export HF_ENDPOINT="http://your-pulp-instance/pulp/content/huggingface"

# Download a model
huggingface-cli download bert-base-uncased

# Download only specific files
huggingface-cli download bert-base-uncased config.json tokenizer.json

# Download to a specific directory
huggingface-cli download bert-base-uncased --local-dir ./my-model

Download Datasets¶

export HF_ENDPOINT="http://your-pulp-instance/pulp/content/huggingface"

# Download a dataset
huggingface-cli download --repo-type dataset squad

Verify Cache Status¶

# Check what's in your local cache
huggingface-cli scan-cache

Using with Transformers Library¶

Loading Models¶

import os

# Set the endpoint
os.environ["HF_ENDPOINT"] = "http://your-pulp-instance/pulp/content/huggingface"

from transformers import AutoModel, AutoTokenizer

# Load model - will use Pulp cache
model = AutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

Loading Pipelines¶

import os
os.environ["HF_ENDPOINT"] = "http://your-pulp-instance/pulp/content/huggingface"

from transformers import pipeline

# Create a pipeline - model downloaded through Pulp
classifier = pipeline("sentiment-analysis")
result = classifier("I love using Pulp for model caching!")

Using with Hugging Face Hub Library¶

import os
os.environ["HF_ENDPOINT"] = "http://your-pulp-instance/pulp/content/huggingface"

from huggingface_hub import hf_hub_download, snapshot_download

# Download a single file
config_path = hf_hub_download(
    repo_id="bert-base-uncased",
    filename="config.json"
)

# Download entire repository
model_path = snapshot_download(repo_id="bert-base-uncased")

Using with Datasets Library¶

import os
os.environ["HF_ENDPOINT"] = "http://your-pulp-instance/pulp/content/huggingface"

from datasets import load_dataset

# Load a dataset - will use Pulp cache
dataset = load_dataset("squad")

Using with Diffusers Library¶

import os
os.environ["HF_ENDPOINT"] = "http://your-pulp-instance/pulp/content/huggingface"

from diffusers import DiffusionPipeline

# Load a diffusion model through Pulp
pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

Docker Configuration¶

When running containers, pass the environment variable:

Docker RunDocker ComposeKubernetes

docker run -e HF_ENDPOINT="http://pulp-host/pulp/content/huggingface" my-ml-app

version: '3.8'
services:
  ml-app:
    image: my-ml-app
    environment:
      - HF_ENDPOINT=http://pulp-host/pulp/content/huggingface

apiVersion: v1
kind: Pod
metadata:
  name: ml-app
spec:
  containers:
  - name: ml-app
    image: my-ml-app
    env:
    - name: HF_ENDPOINT
      value: "http://pulp-service/pulp/content/huggingface"

Offline Mode¶

Once models are cached in Pulp, you can use them even when the original Hugging Face Hub is unavailable:

import os

# Point to your Pulp instance
os.environ["HF_ENDPOINT"] = "http://your-pulp-instance/pulp/content/huggingface"

# Enable offline mode for transformers
os.environ["TRANSFORMERS_OFFLINE"] = "1"

from transformers import AutoModel

# Will use locally cached content only
model = AutoModel.from_pretrained("bert-base-uncased")

Note

Offline mode only works for content that has already been cached. If you request content that hasn't been cached yet, you'll get an error.

Troubleshooting¶

Connection Refused¶

If you get connection errors:

Verify Pulp is running and accessible
Check the distribution exists and has the correct base_path
Ensure there are no firewalls blocking the connection

# Test connectivity
curl -I http://your-pulp-instance/pulp/content/huggingface/api/models/bert-base-uncased

SSL Certificate Errors¶

For self-signed certificates:

import os

# Disable SSL verification (development only!)
os.environ["HF_HUB_DISABLE_SSL_VERIFICATION"] = "1"
os.environ["HF_ENDPOINT"] = "https://your-pulp-instance/pulp/content/huggingface"

Security

Disabling SSL verification is not recommended for production environments.

Slow Downloads¶

If downloads are slow:

Check network connectivity between client and Pulp
Verify Pulp has adequate resources
Consider increasing download_concurrency on the remote