Record t1.18.4 task evidence

2026-03-23 15:28:05 -04:00
parent a45522c110
commit a93229408b
1 changed files with 99 additions and 1 deletions
--- a/pm/tasks.org
+++ b/pm/tasks.org
@@ -927,7 +927,7 @@ beef patty by weight not made into effective price
 - Giant loose-weight rows already had deterministic `picked_weight` and `price_per_lb`; this task reuses that basis when parsed size/pack is absent.
 - Parsed package size still wins when present, so fixed-size products keep their original comparison basis and `normalized_item_id` behavior does not change.

-* [x] t1.18.3: fix costco normalization quantity carry-through for weight-based items (1-3 commits)
+* [X] t1.18.3: fix costco normalization quantity carry-through for weight-based items (1-3 commits)
 ** acceptance criteria
 1. add regression tests covering known broken Costco quantity-basis cases before changing parser logic
 2. Costco normalization correctly parses explicit weight-bearing package text into normalized quantity fields for known cases such as:
@@ -962,6 +962,104 @@ Costco 25# FLOUR not parsed into normalized weight - meaure_type says each
 ** notes
 - Costco `25#` weight text was falling through to `each` because the hash-size parser missed sizes followed by whitespace.
 - This fix is intentionally narrow: explicit `#`-weight parsing now feeds the existing quantity and effective-price flow without changing `normalized_item_id` behavior.
+
+* [x] t1.18.4: clean purchases output and finalize effective price fields (2-4 commits)
+make `purchases.csv` easier to inspect and ensure price fields support weighted cost analysis
+
+** acceptance criteria
+1. reorder `data/purchases.csv` columns for human inspection, with analysis fields first:
+   - `purchase_date`
+   - `retailer`
+   - `catalog_name`
+   - `product_type`
+   - `category`
+   - `net_line_total`
+   - `normalized_quantity`
+   - `effective_price`
+   - `effective_price_unit`
+   - followed by order/item/provenance fields
+3. populate `net_line_total` for all purchase rows:
+   - preserve existing net_line_total when already populated; 
+   - otherwise, derive `net_line_total = line_total + matched_discount_amount` when discount exists;
+   - else `net_line_total = line_total`
+4. compute `effective_price` from `net_line_total / normalized_quantity` when `normalized_quantity > 0`
+5. add `effective_price_unit` and populate it consistently from the normalized quantity basis
+6. preserve blanks rather than writing `0` or divide-by-zero when no valid denominator exists
+- pm note: this task is about final purchase output correctness and usability, not review/catalog logic
+
+** evidence
+- commit: `a45522c` `Finalize purchase effective price fields`
+- tests: `./venv/bin/python -m unittest tests.test_purchases`; `./venv/bin/python build_purchases.py`
+- datetime: 2026-03-23 15:27:42 EDT
+
+** notes
+- `purchases.csv` now carries a filled `net_line_total` for every row, preserving existing values from normalization and deriving the rest from `line_total` plus matched discounts.
+- `effective_price_unit` now mirrors the normalized quantity basis, so downstream analysis can tell whether an `effective_price` is per `lb`, `oz`, `count`, or `each`.
+
+* [ ] t1.19: make review_products.py robust to orphaned and incomplete catalog links (2-4 commits)
+refresh review state from the current normalized universe so missing or broken links re-enter review instead of silently disappearing
+
+** acceptance criteria
+1. `review_products.py` regenerates review candidates from the current normalized item universe, not just previously queued items (/data/<provider>/normalized_items.csv)
+2. items are added or re-added to review when:
+   - they have no valid `catalog_id`
+   - their linked `catalog_id` no longer exists
+   - their linked catalog row does noth have both "catalog_name" AND "product_type"
+3. `review_products.py` compares and reconciles:
+   - current normalized items
+   - current product_links
+   - current catalog
+   - current review_queue
+4. rerunning review after manual cleanup of `product_links.csv` or `catalog.csv` surfaces newly orphaned normalized items
+5. unresolved items remain visible and are not silently dropped from review or purchases accounting
+- pm note: keep the logic explicit and auditable; this is a refresh/reconciliation task, not a new matching system
+
+** evidence
+- commit:
+- tests:
+- datetime:
+
+** notes
+* [ ] t1.20: add visit-level fields and outputs for spend analysis (2-4 commits)
+ensure purchases retains enough visit/order context to support spend-by-visit and store-level analysis
+
+** acceptance criteria
+1. `data/purchases.csv` retains or adds the visit/order fields needed for visit analysis:
+   - `order_id`
+   - `purchase_date`
+   - `store_name`
+   - `store_number`
+   - `store_city`
+   - `store_state`
+   - `retailer`
+2. purchases output supports these analyses without additional joins:
+   - spend by visit
+   - items per visit
+   - category spend by visit
+   - retailer/store breakdown
+3. documentation or task notes make clear that `purchases.csv` is the primary analysis artifact for both item-level and visit-level reporting
+- pm note: do not build dash/plotly here; this task is only about carrying the right data through
+
+** evidence
+- commit:
+- tests:
+- datetime:
+
+** notes
+
+
+* [ ] t1.21: add lightweight charting/analysis surface on top of purchases.csv (2-4 commits)
+build a minimal analysis layer for common price and visit charts without changing the csv pipeline
+
+** acceptance criteria
+1. support charting of:
+   - item price over time
+   - spend by visit
+   - items per visit
+   - category spend over time
+   - retailer/store comparison
+2. use `data/purchases.csv` as the source of truth
+3. keep excel/pivot compatibility intact
 - pm note: thin reader layer only; do not move business logic out of the pipeline
 * [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved normalized retailer items (2-4 commits)