Record t1.18 and t1.18.1 evidence

This commit is contained in:
ben
2026-03-23 12:54:09 -04:00
parent dc0d0614bb
commit 3bc76ed243

View File

@@ -803,26 +803,19 @@ correct and document deterministic normalized quantity fields so unit-cost analy
- The missing purchases fields were a carry-through bug: normalization had `normalized_quantity` and `normalized_quantity_unit`, but `build_purchases.py` never wrote them into `data/review/purchases.csv`. - The missing purchases fields were a carry-through bug: normalization had `normalized_quantity` and `normalized_quantity_unit`, but `build_purchases.py` never wrote them into `data/review/purchases.csv`.
- Normalized quantity now prefers explicit package basis over `each`, so rows like `PEPSI 6PK 7.5Z` resolve to `90 oz` and `KS ALMND BAR US 1.74QTS` purchased twice resolves to `3.48 qt`. - Normalized quantity now prefers explicit package basis over `each`, so rows like `PEPSI 6PK 7.5Z` resolve to `90 oz` and `KS ALMND BAR US 1.74QTS` purchased twice resolves to `3.48 qt`.
- The derivation stays conservative and does not convert units during normalization; parsed units such as `oz`, `lb`, `qt`, and `count` are preserved as-is. - The derivation stays conservative and does not convert units during normalization; parsed units such as `oz`, `lb`, `qt`, and `count` are preserved as-is.
* [ ] t1.18: add regression tests for known quantity/price failures (1-2 commits) * [x] t1.18: add regression tests for known quantity/price failures (1-2 commits)
capture the currently broken comparison cases before changing normalization or purchases logic capture the currently broken comparison cases before changing normalization or purchases logic
** acceptance criteria ** acceptance criteria
1. when generating `data/purchases.csv`, add `effective_price` = `effective_total` / `normalized_quantity` 1. ensure the new tests assert the intended `effective_price` behavior for the known banana, ice, and beef patty examples
2. define `effective_price` behavior explicitly from the covered cases: 2. add tests covering known broken cases:
- use `net_line_total` when present and non-zero, else use `line_total`
- divide by `normalized_quantity` when `normalized_quantity > 0`
- leave blank when no valid denominator exists
- never emit `0` or divide-by-zero for missing-basis cases
- `effective_price` only comparable within same `normalized_quantity_unit` unless later analysis converts the units
3. ensure the new tests assert the intended `effective_price` behavior for the known banana, ice, and beef patty examples
4. add tests covering known broken cases:
- giant bananas produce non-blank effective price - giant bananas produce non-blank effective price
- giant bagged ice produces non-zero effective price - giant bagged ice produces non-zero effective price
- costco bananas retain correct effective price - costco bananas retain correct effective price
- beef patty comparison rows preserve expected quantity basis behavior - beef patty comparison rows preserve expected quantity basis behavior
5. tests fail against current broken behavior and document the expected outcome 3. tests fail against current broken behavior and document the expected outcome
6. include at least one assertion that effective_price is blank rather than `0` or divide-by-zero when no denominator exists 4. include at least one assertion that effective_price is blank rather than `0` or divide-by-zero when no denominator exists
7. pm note: this task should only add tests/fixtures and not change business logic - pm note: this task should only add tests/fixtures and not change business logic
** pm identified problems ** pm identified problems
we have a few problems to scope. looks like: we have a few problems to scope. looks like:
1. normalize_giant_web not always propagating weight data to price_per 1. normalize_giant_web not always propagating weight data to price_per
@@ -862,37 +855,45 @@ purchase_date retailer normalized_item_name catalog_name category product_type q
10/10/2025 giant BAGGED ICE bagged ice cubes frozen ice 1 EA 20 lb 20 lb weight 4.99 4.99 4.99 line_total_over_qty 0.2495 parsed_size_lb 0.0156 parsed_size_lb_to_oz 0 10/10/2025 giant BAGGED ICE bagged ice cubes frozen ice 1 EA 20 lb 20 lb weight 4.99 4.99 4.99 line_total_over_qty 0.2495 parsed_size_lb 0.0156 parsed_size_lb_to_oz 0
``` ```
** evidence ** evidence
- commit: - commit: `605c944`
- tests: - tests: `./venv/bin/python -m unittest tests.test_purchases` (fails as expected before implementation: missing `effective_price` in purchases rows)
- datetime: - datetime: 2026-03-23 12:52:32 EDT
** notes ** notes
- Added purchases-level regression coverage for the known comparison cases before implementation: Giant banana, Costco banana, Giant bagged ice, Costco beef patties, and a blank-denominator case.
- The current failure mode is the intended one for this task: `build_purchase_rows()` does not yet emit `effective_price`, so the tests document the missing behavior before `t1.18.1`.
* [ ] t1.18.1: fix effective price calculation precedence and blank handling (1-3 commits) * [x] t1.18.1: fix effective price calculation precedence and blank handling (1-3 commits)
correct purchases/effective price logic for the known broken cases using existing normalized fields correct purchases/effective price logic for the known broken cases using existing normalized fields
** acceptance criteria ** acceptance criteria
1. effective_price uses explicit numerator precedence: 1. when generating `data/purchases.csv`, add `effective_price` = `effective_total` / `normalized_quantity`
2. effective_price uses explicit numerator precedence:
- prefer `net_line_total` - prefer `net_line_total`
- fallback to `line_total` - fallback to `line_total`
2. effective_price uses `normalized_quantity` when present and > 0 3. effective_price uses `normalized_quantity` if not blank
3. effective_price is blank when no valid denominator exists 4. effective_price is blank when no valid denominator exists
4. effective_price is never written as `0` or divide-by-zero for missing-basis cases 5. effective_price is never written as `0` or divide-by-zero for missing-basis cases
5. existing regression tests for bananas and ice pass 6. effective_price is only comparable within same `normalized_quantity_unit` unless later analysis converts the units
7. existing regression tests for bananas and ice pass
- pm note: keep this limited to calculation logic; do not broaden into catalog or review changes - pm note: keep this limited to calculation logic; do not broaden into catalog or review changes
** evidence ** evidence
- commit: - commit: `dc0d061`
- tests: - tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python build_purchases.py`
- datetime: - datetime: 2026-03-23 12:53:34 EDT
** notes ** notes
- `effective_price` is now a downstream purchases field only. It does not replace `price_per_lb` / `price_per_each`; it gives one deterministic comparison value based on the existing normalized quantity basis.
- The implemented precedence is: use non-zero `net_line_total` when present, otherwise `line_total`; divide by `normalized_quantity` when that denominator is > 0; otherwise leave blank.
- This keeps the calculation conservative for mixed-quality data: Costco bananas and ice now compute correctly, while rows like Giant patties with no quantity basis stay blank instead of producing `0` or a divide-by-zero artifact.
* [ ] t1.18.2: fix giant normalization quantity carry-through for weight-based items (1-3 commits) * [ ] t1.18.2: fix giant normalization quantity carry-through for weight-based items (1-3 commits)
ensure giant normalization emits usable normalized quantity for known weight-based cases ensure giant normalization emits usable normalized quantity for known weight-based cases
** acceptance criteria ** acceptance criteria
1. giant bananas populate normalized quantity and unit from deterministic weight basis 1. giant bananas populate normalized quantity and unit from deterministic weight basis
2. giant weight-based items that already produce `price_per_lb` also carry enough quantity basis for effective price calculation where supported 2. giant weight-based items that already produce `price_per_lb` also carry enough quantity basis for effective price calculation where supported
3. existing regression tests pass without changing normalized_item_id behavior 3. existing regression tests pass without changing normalized_item_id behavior