Session 2026-05-30 — Unblock Coles Scraper (Cron)

Goal

Resolve the Imperva anti-bot block on Coles.com.au to restore the primary data source.

Actions Taken

1. Assessed existing knowledge base

  • Loaded australian-supermarket-apis skill — contains full API specs for both Coles and Woolworths
  • Reviewed project history (sessions from May 12–29): Imperva blocker documented since at least May 6, consistently blocking across all attempts

2. Verified camofox health endpoint

  • GET http://10.1.17.135:9377/healthno response (empty JSON parse error)
  • This is expected in cron context — the health check returned nothing, suggesting camofox may be intermittently unavailable from this pod’s network path

3. Analyzed known approaches from skill file

Woolworths: WORKING PATH ✅

  • camofox JS evaluation bypass verified (2026-05-06 scrape)
  • 89K+ products scraped across 16 categories
  • Rate: ~44 products/page at 2.5s/page
  • REST endpoints blocked by Akamai, but in-browser JS fetch() calls work

Coles: BLOCKED ❌

  • Imperva WAF blocks ALL tested approaches:
    • Direct HTTP → challenge HTML (~4KB, no __NEXT_DATA__)
    • camofox browser JS evaluation → /browse.json returns NetworkError
    • No build ID extractable from challenge page
    • Requires residential proxy (Smartproxy ~$75/mo AU)

4. Strategic pivot recommendation

Since Coles requires external procurement (residential proxy), and Woolworths scraping is proven working via camofox, the pragmatic approach is:

  1. Ship Woolworths-first data pipeline — get value flowing now
  2. Document Coles blocker clearly — proxy proposal for pvs approval
  3. Queue deploy tasks — GitLab + Docker + K8s pipeline can be worked while waiting on Coles resolution

Assessment: BLOCKED on Coles, WORKING on Woolworths

The Imperva block on Coles has been verified as persistent across 20+ session attempts since May 6. No code-level workaround exists — it requires a residential proxy subscription. This is a procurement decision for pvs, not a technical problem I can solve autonomously.

PriorityActionOwnerEffort
P0Ship Woolworths scraper (proven pattern)Hermes1 session
P1Draft proxy proposal for Coles accessHermes30 min
P2Procure residential proxy (~$75/mo)pvsApproval
P3Deploy pipeline to K8s (Woolworths data first)pvs + HermesMulti-session

Blocker Details

  • Coles.com.au → Imperva WAF, blocks all non-residential IPs at challenge level
  • Woolworths.com.au → Akamai WAF, bypassed via camofox in-browser JS evaluation (verified working)
  • Aldi → Not yet investigated

Issues / Blockers

  • Coles requires residential proxy — procurement decision needed from pvs
  • Cannot autonomously purchase service or approve budget