diff --git a/pm/tasks.org b/pm/tasks.org index 8728d04..95b82af 100644 --- a/pm/tasks.org +++ b/pm/tasks.org @@ -763,8 +763,46 @@ enable fast lookup of catalog items during review via tokenized search and repla - Search intentionally optimizes for manual speed rather than smart ranking: simple token overlap, max 10 rows, and immediate persistence on selection. - Follow-up fix: search moved to `[f]ind` so `[s]kip` remains available at the main prompt. -* [ ] t1.16.2: catalog search refinement +* [x] t1.17: fix normalized quantity derivation and carry it through purchases (2-4 commits) +correct and document deterministic normalized quantity fields so unit-cost analysis works across package sizes +** Acceptance Criteria +1. populate and validate `normalized_quantity` and `normalized_quantity_unit` in `data//normalized_items.csv` + - these columns already exist and must be corrected rather than reintroduced +2. carry `normalized_quantity` and `normalized_quantity_unit` through to `data/review/purchases.csv` +3. derive normalized quantity deterministically from existing parsed fields only: + - `qty` + - `pack_qty` + - `size_value` + - `size_unit` + - `measure_type` +4. prefer the best deterministic basis rather than falling back to `each` too early: + - count items when count is explicit + - weight items when parsed weight is explicit + - volume items when parsed volume is explicit + - `each` only when no better basis is available +5. handle common cases explicitly, including totals derived from deterministic patterns such as: + - `18 count` + - `5 lb` + - `64 oz` + - `2 each` +6. preserve blanks when no reliable normalized quantity basis can be derived +7. existing `normalized_item_id` values remain stable; this task must not change retailer-level grouping identity +8. document the derivation rules and any intentional conversions or non-conversions in `pm/data-model.org` or task notes + - if unit conversions are allowed, they must be explicit and minimal +- pm note: keep this deterministic and conservative; do not introduce fuzzy inference +- pm note: if `lb <-> oz` or volume conversions are used, document them directly rather than hiding them in code +- pm note: this task enables cost analysis and charting, not catalog/review changes + +** evidence +- commit: `d25448b` +- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python normalize_giant_web.py`; `./venv/bin/python normalize_costco_web.py`; `./venv/bin/python build_purchases.py` +- datetime: 2026-03-21 21:02:21 EDT + +** notes +- The missing purchases fields were a carry-through bug: normalization had `normalized_quantity` and `normalized_quantity_unit`, but `build_purchases.py` never wrote them into `data/review/purchases.csv`. +- Normalized quantity now prefers explicit package basis over `each`, so rows like `PEPSI 6PK 7.5Z` resolve to `90 oz` and `KS ALMND BAR US 1.74QTS` purchased twice resolves to `3.48 qt`. +- The derivation stays conservative and does not convert units during normalization; parsed units such as `oz`, `lb`, `qt`, and `count` are preserved as-is. * [ ] 1t.10: add optional llm-assisted suggestion workflow for unresolved normalized retailer items (2-4 commits) ** acceptance criteria