Record t1.18.4 task evidence

This commit is contained in:
ben
2026-03-23 15:28:05 -04:00
parent a45522c110
commit a93229408b

View File

@@ -927,7 +927,7 @@ beef patty by weight not made into effective price
- Giant loose-weight rows already had deterministic `picked_weight` and `price_per_lb`; this task reuses that basis when parsed size/pack is absent.
- Parsed package size still wins when present, so fixed-size products keep their original comparison basis and `normalized_item_id` behavior does not change.
* [x] t1.18.3: fix costco normalization quantity carry-through for weight-based items (1-3 commits)
* [X] t1.18.3: fix costco normalization quantity carry-through for weight-based items (1-3 commits)
** acceptance criteria
1. add regression tests covering known broken Costco quantity-basis cases before changing parser logic
2. Costco normalization correctly parses explicit weight-bearing package text into normalized quantity fields for known cases such as:
@@ -962,6 +962,104 @@ Costco 25# FLOUR not parsed into normalized weight - meaure_type says each
** notes
- Costco `25#` weight text was falling through to `each` because the hash-size parser missed sizes followed by whitespace.
- This fix is intentionally narrow: explicit `#`-weight parsing now feeds the existing quantity and effective-price flow without changing `normalized_item_id` behavior.
* [x] t1.18.4: clean purchases output and finalize effective price fields (2-4 commits)
make `purchases.csv` easier to inspect and ensure price fields support weighted cost analysis
** acceptance criteria
1. reorder `data/purchases.csv` columns for human inspection, with analysis fields first:
- `purchase_date`
- `retailer`
- `catalog_name`
- `product_type`
- `category`
- `net_line_total`
- `normalized_quantity`
- `effective_price`
- `effective_price_unit`
- followed by order/item/provenance fields
3. populate `net_line_total` for all purchase rows:
- preserve existing net_line_total when already populated;
- otherwise, derive `net_line_total = line_total + matched_discount_amount` when discount exists;
- else `net_line_total = line_total`
4. compute `effective_price` from `net_line_total / normalized_quantity` when `normalized_quantity > 0`
5. add `effective_price_unit` and populate it consistently from the normalized quantity basis
6. preserve blanks rather than writing `0` or divide-by-zero when no valid denominator exists
- pm note: this task is about final purchase output correctness and usability, not review/catalog logic
** evidence
- commit: `a45522c` `Finalize purchase effective price fields`
- tests: `./venv/bin/python -m unittest tests.test_purchases`; `./venv/bin/python build_purchases.py`
- datetime: 2026-03-23 15:27:42 EDT
** notes
- `purchases.csv` now carries a filled `net_line_total` for every row, preserving existing values from normalization and deriving the rest from `line_total` plus matched discounts.
- `effective_price_unit` now mirrors the normalized quantity basis, so downstream analysis can tell whether an `effective_price` is per `lb`, `oz`, `count`, or `each`.
* [ ] t1.19: make review_products.py robust to orphaned and incomplete catalog links (2-4 commits)
refresh review state from the current normalized universe so missing or broken links re-enter review instead of silently disappearing
** acceptance criteria
1. `review_products.py` regenerates review candidates from the current normalized item universe, not just previously queued items (/data/<provider>/normalized_items.csv)
2. items are added or re-added to review when:
- they have no valid `catalog_id`
- their linked `catalog_id` no longer exists
- their linked catalog row does noth have both "catalog_name" AND "product_type"
3. `review_products.py` compares and reconciles:
- current normalized items
- current product_links
- current catalog
- current review_queue
4. rerunning review after manual cleanup of `product_links.csv` or `catalog.csv` surfaces newly orphaned normalized items
5. unresolved items remain visible and are not silently dropped from review or purchases accounting
- pm note: keep the logic explicit and auditable; this is a refresh/reconciliation task, not a new matching system
** evidence
- commit:
- tests:
- datetime:
** notes
* [ ] t1.20: add visit-level fields and outputs for spend analysis (2-4 commits)
ensure purchases retains enough visit/order context to support spend-by-visit and store-level analysis
** acceptance criteria
1. `data/purchases.csv` retains or adds the visit/order fields needed for visit analysis:
- `order_id`
- `purchase_date`
- `store_name`
- `store_number`
- `store_city`
- `store_state`
- `retailer`
2. purchases output supports these analyses without additional joins:
- spend by visit
- items per visit
- category spend by visit
- retailer/store breakdown
3. documentation or task notes make clear that `purchases.csv` is the primary analysis artifact for both item-level and visit-level reporting
- pm note: do not build dash/plotly here; this task is only about carrying the right data through
** evidence
- commit:
- tests:
- datetime:
** notes
* [ ] t1.21: add lightweight charting/analysis surface on top of purchases.csv (2-4 commits)
build a minimal analysis layer for common price and visit charts without changing the csv pipeline
** acceptance criteria
1. support charting of:
- item price over time
- spend by visit
- items per visit
- category spend over time
- retailer/store comparison
2. use `data/purchases.csv` as the source of truth
3. keep excel/pivot compatibility intact
- pm note: thin reader layer only; do not move business logic out of the pipeline
* [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved normalized retailer items (2-4 commits)