Record t1.13 task evidence

2026-03-17 15:07:51 -04:00
parent 08e2a86cbd
commit 7743421918
1 changed files with 55 additions and 1 deletions
--- a/pm/tasks.org
+++ b/pm/tasks.org
@@ -416,7 +416,61 @@ Clearly show current state separate from proposed future state.
 - Numbered canonical selection plus confirmation worked better than free-text id entry and should reduce accidental links.
 - Deterministic suggestions remain intentionally conservative; they speed up common cases, but unresolved items still depend on human review by design.

-* [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits)
+* [X] t1.13.1 pipeline accountability and stage visibility (1-2 commits)
+add simple accounting so we can see what survives or drops at each pipeline stage
+
+** AC
+1. emit counts for raw, enriched, combined/observed, review-queued, canonical-linked, and final purchase-log rows
+2. report unresolved and dropped item counts explicitly
+3. make it easy to verify that missing items were intentionally left in review rather than silently lost
+- pm note: simple text/json/csv summary is sufficient; trust and visibility matter more than presentation
+
+** evidence
+- commit:
+- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python report_pipeline_status.py --help`; `./venv/bin/python report_pipeline_status.py`; verified `combined_output/pipeline_status.csv` and `combined_output/pipeline_status.json`
+- date: 2026-03-17
+
+** notes
+- Added a single explicit status script instead of threading counters through every pipeline step; this keeps the pipeline simple while still making row survival visible.
+- The most useful check here is `unresolved_not_in_review_rows`; when it is non-zero, we know we have a real accounting bug rather than normal unresolved work.
+
+* [X] t1.13.2 costco discount matching and net pricing in enrich_costco (2-3 commits)
+refactor costco enrichment so discount lines are matched to purchased items and net pricing is preserved
+
+** AC
+1. detect costco discount/coupon rows like `/<retailer_item_id>` and match them to purchased items within the same order
+2. preserve raw discount rows for auditability while also carrying matched discount values onto the purchased item row
+3. add explicit fields for discount-adjusted pricing, e.g. `matched_discount_amount` and `net_line_total` (or equivalent)
+4. preserve original raw receipt amounts (`line_total`) without overwriting them
+- pm note: keep this retailer-specific and explicit; do not introduce generic discount heuristics
+
+** evidence
+- commit:
+- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python enrich_costco.py`; verified matched Costco discount rows now populate `matched_discount_amount` and `net_line_total` while preserving raw `line_total`
+- date: 2026-03-17
+
+** notes
+- Kept this retailer-specific and literal: only discount rows with `/<retailer_item_id>` are matched, and only within the same order.
+- Raw discount rows are still preserved for auditability; the purchased row now carries the matched adjustment separately rather than overwriting the original amount.
+* [X] t1.13.3 canonical cleanup and review-first product identity (3-4 commits)
+refactor canonical generation so product identity is cleaner, duplicate canonicals are reduced, and unresolved items stay in review instead of spawning junk canonicals
+
+** AC
+1. stop auto-creating new canonical products from weak normalized names alone; unresolved items remain in `review_queue.csv`
+2. canonical names are based on stable product identity rather than noisy observed titles
+3. packaging/count/size tokens are removed from canonical names when they belong in structured fields (`pack_qty`, `size_value`, `size_unit`)
+4. consolidate obvious duplicate canonicals (e.g. egg/lime cases) and ensure final outputs retain raw item name, normalized item name, and canonical item id
+- pm note: prefer conservative canonical creation and a better manual review loop over aggressive auto-unification
+
+** evidence
+- commit:
+- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python build_purchases.py`; `./venv/bin/python review_products.py --refresh-only`; verified weaker exact-name cases now remain unresolved in `combined_output/review_queue.csv` and canonical names are cleaned before auto-catalog creation
+- date: 2026-03-17
+
+** notes
+- Removed weak exact-name auto-canonical creation so ambiguous products stay in review instead of generating junk canonicals.
+- Canonical display names are now cleaned of obvious punctuation and packaging noise, but I kept the cleanup conservative rather than adding a broad fuzzy merge layer.
+* [ ] 1t.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits)

 ** acceptance criteria
 - llm suggestions are generated only for unresolved observed products