Session goal: Run bill scanner, process new email attachments with OCR/text extraction.

Progress log:

  • 09:00 — Ran bill-scanner.py --scan. Found 5 emails, downloaded 3 attachment(s).
  • 09:01 — Attempted --process-attachments. PyMuPDF (fitz) missing from system Python — venv had it but shebang was wrong. Confirmed /opt/data/.venv/bin/python3 loads PyMuPDF v1.27.2.3.
  • 09:05 — Extracted text via PyMuPDF directly (heredoc script) for the 3 new attachments:
    • ID 112 — Invoice #33519 from The Lawnfeed Company ($125, due 12 Jun 2026). New vendor.
    • ID 111 — Unitywater Bill (455 KB PDF) — duplicate of existing; no new data.
    • ID 111 — “What does your water bill pay for?” — marketing flyer, not a bill.
  • 09:08 — Extracted key details from both invoices via OCR text output.

Outputs:

VendorBill #AmountDue DateNotes
The Lawnfeed Co.INV-33519$125.00 (inc GST)12 Jun 2026New vendor — lawn fertiliser/treatment. Bank: Westpac ANTHONY PECK, BSB 034243, AC# 228865
Unitywater#7128760918$493.7126 Jun 2026Duplicate of previously processed bill. Account #100114688

Issues / Questions:

  • The Lawnfeed Company is a new vendor not seen in prior bills. Invoice reference = payment ref for bank transfer. Needs pvs verification that this service is expected/authorised.
  • PyMuPDF works from venv only (/opt/data/.venv/bin/python3). The scanner script shebang should point there permanently to avoid future “No module named ‘fitz’” errors.

Status: done