Record t1.17 task evidence

2026-03-21 21:50:16 -04:00
parent d25448b690
commit 38c2c2ea2e
1 changed files with 39 additions and 1 deletions
--- a/pm/tasks.org
+++ b/pm/tasks.org
@@ -763,8 +763,46 @@ enable fast lookup of catalog items during review via tokenized search and repla
 - Search intentionally optimizes for manual speed rather than smart ranking: simple token overlap, max 10 rows, and immediate persistence on selection.
 - Follow-up fix: search moved to `[f]ind` so `[s]kip` remains available at the main prompt.

-* [ ] t1.16.2: catalog search refinement
+* [x] t1.17: fix normalized quantity derivation and carry it through purchases (2-4 commits)
+correct and document deterministic normalized quantity fields so unit-cost analysis works across package sizes

+** Acceptance Criteria
+1. populate and validate `normalized_quantity` and `normalized_quantity_unit` in `data/<retailer-method>/normalized_items.csv`
+   - these columns already exist and must be corrected rather than reintroduced
+2. carry `normalized_quantity` and `normalized_quantity_unit` through to `data/review/purchases.csv`
+3. derive normalized quantity deterministically from existing parsed fields only:
+   - `qty`
+   - `pack_qty`
+   - `size_value`
+   - `size_unit`
+   - `measure_type`
+4. prefer the best deterministic basis rather than falling back to `each` too early:
+   - count items when count is explicit
+   - weight items when parsed weight is explicit
+   - volume items when parsed volume is explicit
+   - `each` only when no better basis is available
+5. handle common cases explicitly, including totals derived from deterministic patterns such as:
+   - `18 count`
+   - `5 lb`
+   - `64 oz`
+   - `2 each`
+6. preserve blanks when no reliable normalized quantity basis can be derived
+7. existing `normalized_item_id` values remain stable; this task must not change retailer-level grouping identity
+8. document the derivation rules and any intentional conversions or non-conversions in `pm/data-model.org` or task notes
+   - if unit conversions are allowed, they must be explicit and minimal
+- pm note: keep this deterministic and conservative; do not introduce fuzzy inference
+- pm note: if `lb <-> oz` or volume conversions are used, document them directly rather than hiding them in code
+- pm note: this task enables cost analysis and charting, not catalog/review changes
+
+** evidence
+- commit: `d25448b`
+- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python normalize_giant_web.py`; `./venv/bin/python normalize_costco_web.py`; `./venv/bin/python build_purchases.py`
+- datetime: 2026-03-21 21:02:21 EDT
+
+** notes
+- The missing purchases fields were a carry-through bug: normalization had `normalized_quantity` and `normalized_quantity_unit`, but `build_purchases.py` never wrote them into `data/review/purchases.csv`.
+- Normalized quantity now prefers explicit package basis over `each`, so rows like `PEPSI 6PK 7.5Z` resolve to `90 oz` and `KS ALMND BAR US 1.74QTS` purchased twice resolves to `3.48 qt`.
+- The derivation stays conservative and does not convert units during normalization; parsed units such as `oz`, `lb`, `qt`, and `count` are preserved as-is.
 * [ ] 1t.10: add optional llm-assisted suggestion workflow for unresolved normalized retailer items (2-4 commits)

 ** acceptance criteria