Session Log — 2026-05-19 (Phase 4: Implementation)
Time: 08:15 AM UTC
Agent: hermes
Kanban task: t_72b8d8a8 (synthetic-avatar-pipeline: Build Mick talking-head avatar)
Session Goal
Advance Phase 4 implementation on .106 (RTX 4090). Verify SoulX-FlashHead checkpoint status, test inference pipeline with available reference photos and F5-TTS audio.
Work Done
1. State Review & Infrastructure Audit
Existing assets confirmed:
- ✅ Research phase complete — SoulX-FlashHead selected (Apache 2.0, 96 FPS on RTX 4090)
- ✅ Deployment artifacts ready: Dockerfile, FastAPI inference_server.py, k8s.yaml (all in
deployment/) - ✅ 15 reference photos at
/opt/data/creative/faces/png/— IMG_7985–7999 PNGs (converted from HEIC originals), 1544×1158px, ~42–60 KB each - ✅ K8s manifests ready: Deployment + Service + Ingress for
flashhead.hermes.paralla.org
2. Environment Assessment
SSH to .106 (192.168.100.106): 🔴 BLOCKED — SSH access denied
- Public key
hermes@hermes-agent-7965856958-5t6j8not authorized on target host - All authentication attempts failed: standard ED25519 key + agent-persisted key both rejected
- SSH daemon is responsive (port 22 open, hostkey exchange completes) — this is purely an auth issue
- This blocks: checkpoint download (~5–8 GB), PyTorch/diffusers install, inference testing
Kubernetes cluster: .106 is NOT in the cluster. Cluster nodes are:
openclaw(192.168.100.190) — main node, Ubuntu 24.04openclaw-k8s-2(192.168.100.107) — secondary, Ubuntu 24.04
This means we can’t kubectl exec into .106 to download models or test inference there. The RTX 4090 on .106 is a standalone machine outside the Kubernetes infrastructure.
Browser tools: 🔴 All browser tools (navigate, snapshot, vision) failing with CDP WebSocket connect failed. Cannot use for web research or checkpoint verification — but this doesn’t block current work.
3. Model Status Verification (via web_search)
SoulX-FlashHead GitHub repo is active through April 2026:
- Latest releases: Feb–Mar 2026 timeline
- Gradio demo available, ComfyUI node released March 2, 2026
- HuggingFace demo live since March 9, 2026
- No indication of major breaking changes since original research
4. What Would Be Needed (Phase 4 Implementation Plan)
If .106 access were available:
- SSH to root@192.168.100.106
git clone https://github.com/Soul-AILab/SoulX-FlashHead.git /opt/models/flashhead- Download checkpoints:
huggingface-cli download Soul-AILab/SoulX-FlashHead-1_3B --local-dir /opt/models/checkpoints(~5–8 GB) - Install deps:
pip install torch diffusers transformers accelerate(RTX 4090 CUDA compatible) - Select best reference photo from 15 available (frontal face, neutral expression)
- Test inference with F5-TTS audio output + selected reference image
- Verify lip-sync quality and measure FPS on RTX 4090
Findings Summary
| Item | Status | Notes |
|---|---|---|
| Research & model selection | ✅ Done | SoulX-FlashHead (Apache 2.0, 96 FPS on RTX 4090) |
| License verification | ✅ Done | Apache 2.0 confirmed |
| Reference photos | ✅ Done | 15 PNGs in /opt/data/creative/faces/png/ |
| Deployment artifacts | ✅ Done | Dockerfile + FastAPI + K8s manifests |
| SSH access to .106 | 🚫 BLOCKED | Public key not authorized |
| Model download & inference test | ⏳ Blocked | Requires .106 SSH access |
| K8s deployment | ⏳ Pending | Artifacts ready, waiting on Phase 4 |
Current Status
- ✅ Phase 1: Research — complete (SoulX-FlashHead selected)
- ✅ Phase 2: License verification — complete (Apache 2.0)
- ✅ Phase 3: Reference photos — complete (15 PNGs available)
- 🚫 Phase 4: Implementation — BLOCKED on .106 SSH access
- ⏳ Phase 5: K8s deployment — artifacts ready, pending Phase 4
Blocker Resolution Required
pvs to authorize hermes SSH key on .106:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPf3z9WAxgw6+lGdvuhAiV/kQ0AIAaD6T79gIx84wOtU hermes@hermes-agent-7965856958-5t6j8
Alternative options:
- Manual setup on .106: pvs logs in and runs the clone/download/install steps manually, then reverts to agent-assisted deployment
- Move inference into K8s: If an RTX 4090 node can be added to the cluster, everything runs via kubectl exec
What’s Next (when unblocked)
- Clone SoulX-FlashHead repo to
/opt/models/flashheadon .106 - Download checkpoints (~5–8 GB from HuggingFace)
- Select best reference photo (frontal face, neutral expression)
- Generate test audio via F5-TTS on .193
- Run inference: audio + reference → video output
- Verify lip-sync quality and FPS on RTX 4090
- Package as K8s service using existing deployment artifacts