Created: 2026-06-01 | Updated: 2026-06-01 | Type: Reference
Purpose: Canonical catalog of build/CI failures for automated detection and triage by hermes-03-retro-signals cron.
Structure
Each entry has:
- ID: Unique short code (e.g.,
DEPLOY-MISMATCH) - Name: Human-readable description
- Category: Grouping (infra, test, deployment, model, external)
- Severity: critical / high / medium / low
- DetectionPattern: Regex or string pattern for log matching
- ContextMatcher: Optional secondary filter to reduce false positives
- AutoFileTask: Whether a fix task should be auto-created when threshold hit (yes/no)
Infrastructure Failures
INFRA-CRASHLOOPBACKOFF
| Field | Value |
|---|---|
| ID | INFRA-CRASHLOOPBACKOFF |
| Name | Kubernetes CrashLoopBackOff |
| Category | infra |
| Severity | critical |
| DetectionPattern | CrashLoopBackOff |
| ContextMatcher | (none) |
| AutoFileTask | yes |
Container restarts repeatedly, pod never stabilizes. Often OOM or startup error.
INFRA-OOMKILLED
| Field | Value |
|---|---|
| ID | INFRA-OOMKILLED |
| Name | Container OOM Killed |
| Category | infra |
| Severity | critical |
| DetectionPattern | `OOMKilled |
| ContextMatcher | (none) |
| AutoFileTask | yes |
Pod killed by kernel OOM killer. Memory limit too low or leak in container.
INFRA-CRASH-ON-STARTUP
| Field | Value |
|---|---|
| ID | INFRA-CRASH-ON-STARTUP |
| Name | Container crashed on startup |
| Category | infra |
| Severity | high |
| DetectionPattern | `crash loop detected |
| ContextMatcher | (none) |
| AutoFileTask | yes |
Container fails immediately on startup, likely binary error or missing dependency.
INFRA-IMAGE-PULL-ERROR
| Field | Value |
|---|---|
| ID | INFRA-IMAGE-PULL-ERROR |
| Name | Container image pull failure |
| Category | infra |
| Severity | high |
| DetectionPattern | `Failed to pull image |
| ContextMatcher | (none) |
| AutoFileTask | yes |
Image registry auth failure, rate limit, or corrupt image.
Test Failures
TEST-CI-TIMEOUT
| Field | Value |
|---|---|
| ID | TEST-CI-TIMEOUT |
| Name | CI test timed out |
| Category | test |
| Severity | medium |
| DetectionPattern | `timed out |
| ContextMatcher | (none) |
| AutoFileTask | yes |
Test job exceeded time limit. Usually indicates slow test or deadlock.
TEST-ASSERTION-FAILED
| Field | Value |
|---|---|
| ID | TEST-ASSERTION-FAILED |
| Name | Test assertion failed |
| Category | test |
| Severity | high |
| DetectionPattern | `AssertionError: |
| ContextMatcher | (none) |
| AutoFileTask | yes |
Logic regression — expected output doesn’t match actual.
TEST-DEPENDENCY-MISSING
| Field | Value |
|---|---|
| ID | TEST-DEPENDENCY-MISSING |
| Name | Test dependency not found |
| Category | test |
| Severity | medium |
| DetectionPattern | `ModuleNotFoundError: |
| ContextMatcher | (none) |
| AutoFileTask | yes |
Missing Python package or system dependency in CI environment.
TEST-SERVICE-UNREACHABLE
| Field | Value |
|---|---|
| ID | TEST-SERVICE-UNREACHABLE |
| Name | Test service unreachable |
| Category | test |
| Severity | medium |
| DetectionPattern | `Connection refused.*8080 |
| ContextMatcher | (none) |
| AutoFileTask | no |
Inference server, database, or test fixture not responding. Often transient.
Deployment Failures
DEPLOY-MERGE-CONFLICT
| Field | Value |
|---|---|
| ID | DEPLOY-MERGE-CONFLICT |
| Name | Merge conflict preventing deploy |
| Category | deployment |
| Severity | high |
| DetectionPattern | `CONFLICT.*merging |
| ContextMatcher | (none) |
| AutoFileTask | yes |
CI can’t auto-merge due to conflicting changes. Manual resolution needed.
DEPLOY-HOOK-FAILURE
| Field | Value |
|---|---|
| ID | DEPLOY-HOOK-FAILURE |
| Name | Deployment hook failed |
| Category | deployment |
| Severity | high |
| DetectionPattern | `post-deploy.*hook.*failed |
| ContextMatcher | (none) |
| AutoFileTask | yes |
Post-deploy notification, integration hook, or callback failed.
DEPLOY-REVISION-MISMATCH
| Field | Value |
|---|---|
| ID | DEPLOY-REVISION-MISMATCH |
| Name | Deployment revision doesn’t match commit |
| Category | deployment |
| Severity | medium |
| DetectionPattern | `revision.*does not match |
| ContextMatcher | (none) |
| AutoFileTask | yes |
Rollback or cache issue — wrong version deployed.
Model / Inference Failures
MODEL-SERVER-DOWN
| Field | Value |
|---|---|
| ID | MODEL-SERVER-DOWN |
| Name | Inference server unreachable |
| Category | model |
| Severity | critical |
| DetectionPattern | `connection refused.*8080 |
| ContextMatcher | `192.168.100.(106 |
| AutoFileTask | no |
Local or remote model serving endpoint not responding. Often transient GPU/CPU load issue.
MODEL-OUTPUT-MALFORMED
| Field | Value |
|---|---|
| ID | MODEL-OUTPUT-MALFORMED |
| Name | Model output not parseable as JSON/YAML |
| Category | model |
| Severity | medium |
| DetectionPattern | `invalid JSON |
| ContextMatcher | (none) |
| AutoFileTask | yes |
Structured output pipeline broke — usually indicates prompt template regression or context overflow.
MODEL-VISION-MISMATCH
| Field | Value |
|---|---|
| ID | MODEL-VISION-MISMATCH |
| Name | Vision model output doesn’t match ground truth |
| Category | model |
| Severity | medium |
| DetectionPattern | `vision.*mismatch |
| ContextMatcher | (none) |
| AutoFileTask | no |
Small model for vision/OCR returned wrong result. Usually indicates prompt issue, not model bug.
External Service Failures
EXTERNAL-API-DOWN
| Field | Value |
|---|---|
| ID | EXTERNAL-API-DOWN |
| Name | External API service unavailable |
| Category | external |
| Severity | medium |
| DetectionPattern | `external.*service.*unavailable |
| ContextMatcher | (none) |
| AutoFileTask | no |
Third-party API down — don’t auto-file tasks for things we can’t fix. Monitor only.
EXTERNAL-RATE-LIMITED
| Field | Value |
|---|---|
| ID | EXTERNAL-RATE-LIMITED |
| Name | Rate limited by external service |
| Category | external |
| Severity | medium |
| DetectionPattern | `429 Too Many Requests |
| ContextMatcher | (none) |
| AutoFileTask | no |
Hit API rate limits. Usually self-resolving with backoff.
Error Patterns
ERROR-500-INTEG-FALLBACK
| Field | Value |
|---|---|
| ID | ERROR-500-INTEG-FALLBACK |
| Name | 500 errors from integration fallback (hermes-gateway) |
| Category | external |
| Severity | high |
| DetectionPattern | `integration fallback.*500 |
| ContextMatcher | (none) |
| AutoFileTask | yes |
The integration fallback endpoint at the hermes gateway is returning 500 errors. This often happens when the primary service falls back to a secondary integration path that itself fails. Auto-file for review and fix.
ERROR-DEADLINK-HACK
| Field | Value |
|---|---|
| ID | ERROR-DEADLINK-HACK |
| Name | Deadlink hack attack detected |
| Category | external |
| Severity | critical |
| DetectionPattern | `Deadlink Hack:.*attacker is attempting to inject malicious links |
| ContextMatcher | (none) |
| AutoFileTask | no |
Security alert — external service flagged a Deadlink Hack injection attempt. This is an attack, not a bug. Log it but don’t create fix tasks for attacker payloads.
Taxonomy Meta-Rules
- Pattern specificity: Detection patterns must match the failure mode with high precision to avoid false positives
- Threshold: Auto-file tasks when count >= 3 within rolling 24-hour window
- Escalation: Critical severity → alert + task; High severity → task only; Medium/Low → log only (no auto-task)
- ContextMatcher: Used to further filter matches — must also match in log context if specified