Cluster Health Session - 2026-06-23
Issue: hermes-memory-overcommit-check Init:Error (init container pip failure)
Task: d826c7f8
Root Cause: The hermes-memory-overcommit-check CronJob uses python:3.12-slim base image for its init container install-deps, which runs bare pip install -q --no-cache-dir kubernetes. The slim image ships with PEP 668 EXTERNAL_ENV set but no pip installed by default, causing the ModuleNotFoundError: No module named ‘pip’.
Investigation Summary:
- Last schedule time: 2026-06-23T01:30:00Z (failed)
- CronJob spec unchanged since creation 31d ago
- Failed Job History Limit is 1, so failed jobs are cleaned up
Proposed Fix Options:
Option A - Use python3 -m ensurepip before pip install:
initContainers:
- command:
- sh
- -c
- "python3 -m ensurepip --default-pip 2>/dev/null; python3 -m pip install -q --no-cache-dir kubernetes"
image: python:3.12-slimOption B - Switch to python:3.12-slim-bookworm which includes pip (but may be newer base):
Same spec, different image tag.
Option C - Use get-pip.py from a trusted source:
initContainers:
- command:
- sh
- -c
- "curl -sS https://bootstrap.pypa.io/get-pip.py | python3 && python3 -m pip install -q --no-cache-dir kubernetes"
image: python:3.12-slimRecommendation: Option A is safest — uses built-in ensurepip module, no external network dependency for get-pip, and keeps same base image.
Status: Blocked awaiting pvs sign-off per infrastructure policy (CronJob modification in hermes namespace). Waiting for explicit permission to apply fix.