Queue Observability Snapshot — 2026-06-11

Queue Depth

StateCountTrend (vs 2026-06-07)
done604
active64
blocked3
queued2

Stuck Tasks (>4h idle in active/)

  • 57 stuck tasks detected (threshold: 4h)
  • Oldest: (114.4h idle) — CRITICAL if >72h
TaskAge (hrs)Severity
schema-drift-20260530.md216.6CRITICAL
cluster-fix-calliope-kanban-bug.md179.4CRITICAL
cluster-fix-chat-shim-gosu.md179.3CRITICAL
4a346f5d-vt-004-mastodon-public-timeline-unauthenticated.md178.9CRITICAL
fix-test-assertion-failed-20260604.md173.5CRITICAL
crashloop-calliope-kanban-2026-06-04.md150.1CRITICAL
crashloop-chat-shim-gosu-2026-06-04.md149.7CRITICAL
cluster-janitor-long-jobs-2026-06-04.md135.7CRITICAL
2010f0a5-long-running-job-analysis.md117.7CRITICAL
44c52e92-pods-pending-10-min.md115.9CRITICAL

Failure Tag Triage (active/ tasks modified in last 24h)

CategoryCountSeverity
Unknown tag0✅ CLEAN
CrashLoopBackOff1LOW
Timeout3LOW
API connection fail0NORMAL

Active Task Recency

  • Most recent modification in active/: 0.6h ago
  • Baseline snapshots on disk: 11 (last: 2026-06-07)

Assessment

MetricStatusNotes
Throughput✅ Healthy604 total done — consistent growth
Active load✅ Normal64 active tasks
Blocked✅ Clean3 blocked — 3 items pending unblock
Unknown tags✅ Fixed0 in 24h (was 47 before fix-recurring-unknown)
Stuck tasks⚠️ High57 idle >4h — 45 stale >72h

Key Findings

  1. 57 tasks have been idle >4 hours. 42 are older than 4 days — candidates for archival or re-pickup.
  2. Unknown failure tag is zero — fix-recurring-unknown remains effective (47 → 0 since May 6).
  3. Blocked queue has 3 items — awaiting pvs sign-off on autopilot-db secret rotation (stale >10 days), stuck guard triggered 5+ days ago, and 1 calliope crashloop fix.
  4. Baseline snapshots stopped on Jun 7 — 4-day gap in daily collection. This run restores continuity.

Recommendations for pvs

  1. Archive stale active tasks: 45 tasks >72h idle with no progress — archive to done/ or failed/ based on task content review.
  2. Unblock coaching note: is a meta-task stuck since May 30. Either unblock for autopilot re-processing or archive as completed (tag taxonomy drift was partially fixed, schema updated).
  3. Secret rotation: has a fully-diagnosed root cause (autopilot-db password mismatch) needing pvs sign-off for secret update. Blocked since June 1 (>10 days).