Session Log — 2026-05-19 (Phase 4: Implementation)

Time: 08:15 AM UTC
Agent: hermes
Kanban task: t_72b8d8a8 (synthetic-avatar-pipeline: Build Mick talking-head avatar)

Session Goal

Advance Phase 4 implementation on .106 (RTX 4090). Verify SoulX-FlashHead checkpoint status, test inference pipeline with available reference photos and F5-TTS audio.

Work Done

1. State Review & Infrastructure Audit

Existing assets confirmed:

✅ Research phase complete — SoulX-FlashHead selected (Apache 2.0, 96 FPS on RTX 4090)
✅ Deployment artifacts ready: Dockerfile, FastAPI inference_server.py, k8s.yaml (all in deployment/)
✅ 15 reference photos at /opt/data/creative/faces/png/ — IMG_7985–7999 PNGs (converted from HEIC originals), 1544×1158px, ~42–60 KB each
✅ K8s manifests ready: Deployment + Service + Ingress for flashhead.hermes.paralla.org

2. Environment Assessment

SSH to .106 (192.168.100.106): 🔴 BLOCKED — SSH access denied

Public key hermes@hermes-agent-7965856958-5t6j8 not authorized on target host
All authentication attempts failed: standard ED25519 key + agent-persisted key both rejected
SSH daemon is responsive (port 22 open, hostkey exchange completes) — this is purely an auth issue
This blocks: checkpoint download (~5–8 GB), PyTorch/diffusers install, inference testing

Kubernetes cluster: .106 is NOT in the cluster. Cluster nodes are:

openclaw (192.168.100.190) — main node, Ubuntu 24.04
openclaw-k8s-2 (192.168.100.107) — secondary, Ubuntu 24.04

This means we can’t kubectl exec into .106 to download models or test inference there. The RTX 4090 on .106 is a standalone machine outside the Kubernetes infrastructure.

Browser tools: 🔴 All browser tools (navigate, snapshot, vision) failing with CDP WebSocket connect failed. Cannot use for web research or checkpoint verification — but this doesn’t block current work.

3. Model Status Verification (via web_search)

SoulX-FlashHead GitHub repo is active through April 2026:

Latest releases: Feb–Mar 2026 timeline
Gradio demo available, ComfyUI node released March 2, 2026
HuggingFace demo live since March 9, 2026
No indication of major breaking changes since original research

4. What Would Be Needed (Phase 4 Implementation Plan)

If .106 access were available:

SSH to root@192.168.100.106
git clone https://github.com/Soul-AILab/SoulX-FlashHead.git /opt/models/flashhead
Download checkpoints: huggingface-cli download Soul-AILab/SoulX-FlashHead-1_3B --local-dir /opt/models/checkpoints (~5–8 GB)
Install deps: pip install torch diffusers transformers accelerate (RTX 4090 CUDA compatible)
Select best reference photo from 15 available (frontal face, neutral expression)
Test inference with F5-TTS audio output + selected reference image
Verify lip-sync quality and measure FPS on RTX 4090

Findings Summary

Item	Status	Notes
Research & model selection	✅ Done	SoulX-FlashHead (Apache 2.0, 96 FPS on RTX 4090)
License verification	✅ Done	Apache 2.0 confirmed
Reference photos	✅ Done	15 PNGs in `/opt/data/creative/faces/png/`
Deployment artifacts	✅ Done	Dockerfile + FastAPI + K8s manifests
SSH access to .106	🚫 BLOCKED	Public key not authorized
Model download & inference test	⏳ Blocked	Requires .106 SSH access
K8s deployment	⏳ Pending	Artifacts ready, waiting on Phase 4

Current Status

✅ Phase 1: Research — complete (SoulX-FlashHead selected)
✅ Phase 2: License verification — complete (Apache 2.0)
✅ Phase 3: Reference photos — complete (15 PNGs available)
🚫 Phase 4: Implementation — BLOCKED on .106 SSH access
⏳ Phase 5: K8s deployment — artifacts ready, pending Phase 4

Blocker Resolution Required

pvs to authorize hermes SSH key on .106:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPf3z9WAxgw6+lGdvuhAiV/kQ0AIAaD6T79gIx84wOtU hermes@hermes-agent-7965856958-5t6j8

Alternative options:

Manual setup on .106: pvs logs in and runs the clone/download/install steps manually, then reverts to agent-assisted deployment
Move inference into K8s: If an RTX 4090 node can be added to the cluster, everything runs via kubectl exec

What’s Next (when unblocked)

Clone SoulX-FlashHead repo to /opt/models/flashhead on .106
Download checkpoints (~5–8 GB from HuggingFace)
Select best reference photo (frontal face, neutral expression)
Generate test audio via F5-TTS on .193
Run inference: audio + reference → video output
Verify lip-sync quality and FPS on RTX 4090
Package as K8s service using existing deployment artifacts

Quartz 4

Explorer

2026 05 19 Session1