Record t1.13 task evidence

This commit is contained in:
ben
2026-03-17 15:07:51 -04:00
parent 08e2a86cbd
commit 7743421918

View File

@@ -416,7 +416,61 @@ Clearly show current state separate from proposed future state.
- Numbered canonical selection plus confirmation worked better than free-text id entry and should reduce accidental links.
- Deterministic suggestions remain intentionally conservative; they speed up common cases, but unresolved items still depend on human review by design.
* [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits)
* [X] t1.13.1 pipeline accountability and stage visibility (1-2 commits)
add simple accounting so we can see what survives or drops at each pipeline stage
** AC
1. emit counts for raw, enriched, combined/observed, review-queued, canonical-linked, and final purchase-log rows
2. report unresolved and dropped item counts explicitly
3. make it easy to verify that missing items were intentionally left in review rather than silently lost
- pm note: simple text/json/csv summary is sufficient; trust and visibility matter more than presentation
** evidence
- commit:
- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python report_pipeline_status.py --help`; `./venv/bin/python report_pipeline_status.py`; verified `combined_output/pipeline_status.csv` and `combined_output/pipeline_status.json`
- date: 2026-03-17
** notes
- Added a single explicit status script instead of threading counters through every pipeline step; this keeps the pipeline simple while still making row survival visible.
- The most useful check here is `unresolved_not_in_review_rows`; when it is non-zero, we know we have a real accounting bug rather than normal unresolved work.
* [X] t1.13.2 costco discount matching and net pricing in enrich_costco (2-3 commits)
refactor costco enrichment so discount lines are matched to purchased items and net pricing is preserved
** AC
1. detect costco discount/coupon rows like `/<retailer_item_id>` and match them to purchased items within the same order
2. preserve raw discount rows for auditability while also carrying matched discount values onto the purchased item row
3. add explicit fields for discount-adjusted pricing, e.g. `matched_discount_amount` and `net_line_total` (or equivalent)
4. preserve original raw receipt amounts (`line_total`) without overwriting them
- pm note: keep this retailer-specific and explicit; do not introduce generic discount heuristics
** evidence
- commit:
- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python enrich_costco.py`; verified matched Costco discount rows now populate `matched_discount_amount` and `net_line_total` while preserving raw `line_total`
- date: 2026-03-17
** notes
- Kept this retailer-specific and literal: only discount rows with `/<retailer_item_id>` are matched, and only within the same order.
- Raw discount rows are still preserved for auditability; the purchased row now carries the matched adjustment separately rather than overwriting the original amount.
* [X] t1.13.3 canonical cleanup and review-first product identity (3-4 commits)
refactor canonical generation so product identity is cleaner, duplicate canonicals are reduced, and unresolved items stay in review instead of spawning junk canonicals
** AC
1. stop auto-creating new canonical products from weak normalized names alone; unresolved items remain in `review_queue.csv`
2. canonical names are based on stable product identity rather than noisy observed titles
3. packaging/count/size tokens are removed from canonical names when they belong in structured fields (`pack_qty`, `size_value`, `size_unit`)
4. consolidate obvious duplicate canonicals (e.g. egg/lime cases) and ensure final outputs retain raw item name, normalized item name, and canonical item id
- pm note: prefer conservative canonical creation and a better manual review loop over aggressive auto-unification
** evidence
- commit:
- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python build_purchases.py`; `./venv/bin/python review_products.py --refresh-only`; verified weaker exact-name cases now remain unresolved in `combined_output/review_queue.csv` and canonical names are cleaned before auto-catalog creation
- date: 2026-03-17
** notes
- Removed weak exact-name auto-canonical creation so ambiguous products stay in review instead of generating junk canonicals.
- Canonical display names are now cleaned of obvious punctuation and packaging noise, but I kept the cleanup conservative rather than adding a broad fuzzy merge layer.
* [ ] 1t.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits)
** acceptance criteria
- llm suggestions are generated only for unresolved observed products