From 774342191836916bf03c3d8e6ce153574759ff9f Mon Sep 17 00:00:00 2001 From: ben Date: Tue, 17 Mar 2026 15:07:51 -0400 Subject: [PATCH] Record t1.13 task evidence --- pm/tasks.org | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/pm/tasks.org b/pm/tasks.org index 9e78e3c..11c74d4 100644 --- a/pm/tasks.org +++ b/pm/tasks.org @@ -416,7 +416,61 @@ Clearly show current state separate from proposed future state. - Numbered canonical selection plus confirmation worked better than free-text id entry and should reduce accidental links. - Deterministic suggestions remain intentionally conservative; they speed up common cases, but unresolved items still depend on human review by design. -* [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits) +* [X] t1.13.1 pipeline accountability and stage visibility (1-2 commits) +add simple accounting so we can see what survives or drops at each pipeline stage + +** AC +1. emit counts for raw, enriched, combined/observed, review-queued, canonical-linked, and final purchase-log rows +2. report unresolved and dropped item counts explicitly +3. make it easy to verify that missing items were intentionally left in review rather than silently lost +- pm note: simple text/json/csv summary is sufficient; trust and visibility matter more than presentation + +** evidence +- commit: +- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python report_pipeline_status.py --help`; `./venv/bin/python report_pipeline_status.py`; verified `combined_output/pipeline_status.csv` and `combined_output/pipeline_status.json` +- date: 2026-03-17 + +** notes +- Added a single explicit status script instead of threading counters through every pipeline step; this keeps the pipeline simple while still making row survival visible. +- The most useful check here is `unresolved_not_in_review_rows`; when it is non-zero, we know we have a real accounting bug rather than normal unresolved work. + +* [X] t1.13.2 costco discount matching and net pricing in enrich_costco (2-3 commits) +refactor costco enrichment so discount lines are matched to purchased items and net pricing is preserved + +** AC +1. detect costco discount/coupon rows like `/` and match them to purchased items within the same order +2. preserve raw discount rows for auditability while also carrying matched discount values onto the purchased item row +3. add explicit fields for discount-adjusted pricing, e.g. `matched_discount_amount` and `net_line_total` (or equivalent) +4. preserve original raw receipt amounts (`line_total`) without overwriting them +- pm note: keep this retailer-specific and explicit; do not introduce generic discount heuristics + +** evidence +- commit: +- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python enrich_costco.py`; verified matched Costco discount rows now populate `matched_discount_amount` and `net_line_total` while preserving raw `line_total` +- date: 2026-03-17 + +** notes +- Kept this retailer-specific and literal: only discount rows with `/` are matched, and only within the same order. +- Raw discount rows are still preserved for auditability; the purchased row now carries the matched adjustment separately rather than overwriting the original amount. +* [X] t1.13.3 canonical cleanup and review-first product identity (3-4 commits) +refactor canonical generation so product identity is cleaner, duplicate canonicals are reduced, and unresolved items stay in review instead of spawning junk canonicals + +** AC +1. stop auto-creating new canonical products from weak normalized names alone; unresolved items remain in `review_queue.csv` +2. canonical names are based on stable product identity rather than noisy observed titles +3. packaging/count/size tokens are removed from canonical names when they belong in structured fields (`pack_qty`, `size_value`, `size_unit`) +4. consolidate obvious duplicate canonicals (e.g. egg/lime cases) and ensure final outputs retain raw item name, normalized item name, and canonical item id +- pm note: prefer conservative canonical creation and a better manual review loop over aggressive auto-unification + +** evidence +- commit: +- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python build_purchases.py`; `./venv/bin/python review_products.py --refresh-only`; verified weaker exact-name cases now remain unresolved in `combined_output/review_queue.csv` and canonical names are cleaned before auto-catalog creation +- date: 2026-03-17 + +** notes +- Removed weak exact-name auto-canonical creation so ambiguous products stay in review instead of generating junk canonicals. +- Canonical display names are now cleaned of obvious punctuation and packaging noise, but I kept the cleanup conservative rather than adding a broad fuzzy merge layer. +* [ ] 1t.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits) ** acceptance criteria - llm suggestions are generated only for unresolved observed products