Horizontal Pod Autoscaler (HPA) Support¶
The Pulp Operator now supports Horizontal Pod Autoscaler (HPA) for automatic scaling of Pulp components based on resource utilization.
Overview¶
HPA automatically scales the number of pods in a deployment based on observed CPU and/or memory utilization. This feature is available for the following Pulp components:
- API (
pulpcore-api) - Content (
pulpcore-content) - Worker (
pulpcore-worker) - Web (
pulpcore-web)
Configuration¶
HPA can be configured independently for each component in the Pulp Custom Resource.
Basic Example¶
apiVersion: repo-manager.pulpproject.org/v1
kind: Pulp
metadata:
name: example-pulp
spec:
api:
replicas: 2 # Ignored when HPA is enabled
hpa:
enabled: true
min_replicas: 2
max_replicas: 10
target_cpu_utilization_percentage: 70
Complete Example with All Components¶
apiVersion: repo-manager.pulpproject.org/v1
kind: Pulp
metadata:
name: example-pulp
spec:
# API component with HPA
api:
replicas: 2
resource_requirements:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
hpa:
enabled: true
min_replicas: 2
max_replicas: 10
target_cpu_utilization_percentage: 70
target_memory_utilization_percentage: 80
# Content component with HPA
content:
replicas: 2
resource_requirements:
requests:
cpu: "500m"
memory: "512Mi"
hpa:
enabled: true
min_replicas: 2
max_replicas: 8
target_cpu_utilization_percentage: 75
# Worker component with HPA
worker:
replicas: 2
resource_requirements:
requests:
cpu: "500m"
memory: "512Mi"
hpa:
enabled: true
min_replicas: 1
max_replicas: 20
target_cpu_utilization_percentage: 80
# Web component with HPA (only when not using Route/Ingress)
web:
replicas: 2
resource_requirements:
requests:
cpu: "200m"
memory: "256Mi"
hpa:
enabled: true
min_replicas: 2
max_replicas: 5
target_cpu_utilization_percentage: 70
HPA Configuration Fields¶
enabled¶
- Type: Boolean
- Default:
false - Description: Enables or disables HPA for the component
min_replicas¶
- Type: Integer
- Default:
1 - Minimum:
1 - Description: Minimum number of replicas. HPA will not scale below this value.
max_replicas¶
- Type: Integer
- Required: Yes (when HPA is enabled)
- Minimum:
1 - Description: Maximum number of replicas. HPA will not scale above this value.
target_cpu_utilization_percentage¶
- Type: Integer
- Optional: Yes
- Range:
1-100 - Default:
50(if no metrics are specified) - Description: Target average CPU utilization across all pods (as a percentage of requested CPU)
target_memory_utilization_percentage¶
- Type: Integer
- Optional: Yes
- Range:
1-100 - Description: Target average memory utilization across all pods (as a percentage of requested memory)
Important Considerations¶
1. Resource Requests are Required¶
For HPA to work properly, you must define resource requests for CPU and/or memory:
api:
resource_requirements:
requests:
cpu: "500m" # Required for CPU-based autoscaling
memory: "512Mi" # Required for memory-based autoscaling
hpa:
enabled: true
max_replicas: 10
target_cpu_utilization_percentage: 70
2. Replicas Field is Ignored¶
When HPA is enabled, the replicas field is ignored. The HPA controller manages the replica count based on the observed metrics.
3. Default Metrics¶
If neither target_cpu_utilization_percentage nor target_memory_utilization_percentage is specified, HPA defaults to:
- CPU: 50% utilization
4. Multiple Metrics¶
You can specify both CPU and memory targets. HPA will scale based on whichever metric requires more replicas:
hpa:
enabled: true
max_replicas: 10
target_cpu_utilization_percentage: 70
target_memory_utilization_percentage: 80
5. Metrics Server Required¶
HPA requires the Kubernetes Metrics Server to be installed in your cluster. Verify it's running:
kubectl get deployment metrics-server -n kube-system
Monitoring HPA¶
Check HPA Status¶
# List all HPAs
kubectl get hpa -n <namespace>
# Describe specific HPA
kubectl describe hpa example-pulp-api -n <namespace>
Example HPA Output¶
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
example-pulp-api Deployment/example-pulp-api 45%/70% 2 10 3 5m
View HPA Events¶
kubectl get events -n <namespace> --field-selector involvedObject.name=example-pulp-api
Disabling HPA¶
To disable HPA for a component, set enabled: false or remove the hpa section:
api:
replicas: 3 # This will now be used
hpa:
enabled: false
The operator will automatically delete the HPA resource and revert to using the static replicas value.
Best Practices¶
-
Start Conservative: Begin with higher target utilization percentages (70-80%) and adjust based on observed behavior
-
Set Appropriate Min/Max:
min_replicas: Should handle baseline load-
max_replicas: Should be based on cluster capacity and cost considerations -
Monitor Scaling Behavior: Watch for:
- Frequent scaling up/down (thrashing)
- Hitting max replicas frequently (may need to increase)
-
Staying at min replicas (may be over-provisioned)
-
Resource Requests: Set realistic resource requests based on actual usage patterns
-
Combine with PDB: Use PodDisruptionBudget to ensure availability during scaling events:
api:
hpa:
enabled: true
min_replicas: 3
max_replicas: 10
pdb:
maxUnavailable: 1
Troubleshooting¶
HPA Shows "unknown" for Metrics¶
Cause: Metrics Server is not installed or not working
Solution:
# Check Metrics Server
kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
# Install Metrics Server (if needed)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
HPA Not Scaling¶
Possible causes: 1. Resource requests not defined 2. Metrics Server not running 3. Target utilization already met 4. Insufficient cluster resources
Debug:
# Check HPA status
kubectl describe hpa <hpa-name> -n <namespace>
# Check pod metrics
kubectl top pods -n <namespace>
# Check HPA controller logs
kubectl logs -n kube-system -l k8s-app=kube-controller-manager
Pods Not Scaling Down¶
HPA has a default cooldown period: - Scale up: 3 minutes - Scale down: 5 minutes
This prevents rapid fluctuations. Wait for the cooldown period before expecting scale-down events.
Example Scenarios¶
Scenario 1: High-Traffic API¶
api:
resource_requirements:
requests:
cpu: "1000m"
memory: "1Gi"
hpa:
enabled: true
min_replicas: 3
max_replicas: 20
target_cpu_utilization_percentage: 70
Scenario 2: Batch Processing Workers¶
worker:
resource_requirements:
requests:
cpu: "500m"
memory: "512Mi"
hpa:
enabled: true
min_replicas: 1
max_replicas: 50
target_cpu_utilization_percentage: 80
Scenario 3: Memory-Intensive Content Serving¶
content:
resource_requirements:
requests:
cpu: "500m"
memory: "1Gi"
hpa:
enabled: true
min_replicas: 2
max_replicas: 10
target_memory_utilization_percentage: 75