Session Log — 2026-05-19 (Phase 4: Implementation)

Time: 08:15 AM UTC
Agent: hermes
Kanban task: t_72b8d8a8 (synthetic-avatar-pipeline: Build Mick talking-head avatar)

Session Goal

Advance Phase 4 implementation on .106 (RTX 4090). Verify SoulX-FlashHead checkpoint status, test inference pipeline with available reference photos and F5-TTS audio.

Work Done

1. State Review & Infrastructure Audit

Existing assets confirmed:

  • ✅ Research phase complete — SoulX-FlashHead selected (Apache 2.0, 96 FPS on RTX 4090)
  • ✅ Deployment artifacts ready: Dockerfile, FastAPI inference_server.py, k8s.yaml (all in deployment/)
  • ✅ 15 reference photos at /opt/data/creative/faces/png/ — IMG_7985–7999 PNGs (converted from HEIC originals), 1544×1158px, ~42–60 KB each
  • ✅ K8s manifests ready: Deployment + Service + Ingress for flashhead.hermes.paralla.org

2. Environment Assessment

SSH to .106 (192.168.100.106): 🔴 BLOCKED — SSH access denied

  • Public key hermes@hermes-agent-7965856958-5t6j8 not authorized on target host
  • All authentication attempts failed: standard ED25519 key + agent-persisted key both rejected
  • SSH daemon is responsive (port 22 open, hostkey exchange completes) — this is purely an auth issue
  • This blocks: checkpoint download (~5–8 GB), PyTorch/diffusers install, inference testing

Kubernetes cluster: .106 is NOT in the cluster. Cluster nodes are:

  • openclaw (192.168.100.190) — main node, Ubuntu 24.04
  • openclaw-k8s-2 (192.168.100.107) — secondary, Ubuntu 24.04

This means we can’t kubectl exec into .106 to download models or test inference there. The RTX 4090 on .106 is a standalone machine outside the Kubernetes infrastructure.

Browser tools: 🔴 All browser tools (navigate, snapshot, vision) failing with CDP WebSocket connect failed. Cannot use for web research or checkpoint verification — but this doesn’t block current work.

SoulX-FlashHead GitHub repo is active through April 2026:

  • Latest releases: Feb–Mar 2026 timeline
  • Gradio demo available, ComfyUI node released March 2, 2026
  • HuggingFace demo live since March 9, 2026
  • No indication of major breaking changes since original research

4. What Would Be Needed (Phase 4 Implementation Plan)

If .106 access were available:

  1. SSH to root@192.168.100.106
  2. git clone https://github.com/Soul-AILab/SoulX-FlashHead.git /opt/models/flashhead
  3. Download checkpoints: huggingface-cli download Soul-AILab/SoulX-FlashHead-1_3B --local-dir /opt/models/checkpoints (~5–8 GB)
  4. Install deps: pip install torch diffusers transformers accelerate (RTX 4090 CUDA compatible)
  5. Select best reference photo from 15 available (frontal face, neutral expression)
  6. Test inference with F5-TTS audio output + selected reference image
  7. Verify lip-sync quality and measure FPS on RTX 4090

Findings Summary

ItemStatusNotes
Research & model selection✅ DoneSoulX-FlashHead (Apache 2.0, 96 FPS on RTX 4090)
License verification✅ DoneApache 2.0 confirmed
Reference photos✅ Done15 PNGs in /opt/data/creative/faces/png/
Deployment artifacts✅ DoneDockerfile + FastAPI + K8s manifests
SSH access to .106🚫 BLOCKEDPublic key not authorized
Model download & inference test⏳ BlockedRequires .106 SSH access
K8s deployment⏳ PendingArtifacts ready, waiting on Phase 4

Current Status

  • ✅ Phase 1: Research — complete (SoulX-FlashHead selected)
  • ✅ Phase 2: License verification — complete (Apache 2.0)
  • ✅ Phase 3: Reference photos — complete (15 PNGs available)
  • 🚫 Phase 4: Implementation — BLOCKED on .106 SSH access
  • ⏳ Phase 5: K8s deployment — artifacts ready, pending Phase 4

Blocker Resolution Required

pvs to authorize hermes SSH key on .106:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPf3z9WAxgw6+lGdvuhAiV/kQ0AIAaD6T79gIx84wOtU hermes@hermes-agent-7965856958-5t6j8

Alternative options:

  1. Manual setup on .106: pvs logs in and runs the clone/download/install steps manually, then reverts to agent-assisted deployment
  2. Move inference into K8s: If an RTX 4090 node can be added to the cluster, everything runs via kubectl exec

What’s Next (when unblocked)

  1. Clone SoulX-FlashHead repo to /opt/models/flashhead on .106
  2. Download checkpoints (~5–8 GB from HuggingFace)
  3. Select best reference photo (frontal face, neutral expression)
  4. Generate test audio via F5-TTS on .193
  5. Run inference: audio + reference → video output
  6. Verify lip-sync quality and FPS on RTX 4090
  7. Package as K8s service using existing deployment artifacts