Refactor retailer normalization outputs
This commit is contained in:
@@ -502,7 +502,7 @@ move Giant and Costco collection into the new collect structure and make both re
|
||||
- Added lightweight deprecation nudges on the legacy `scrape_*` commands rather than removing them immediately, so the move is inspectable and low-risk.
|
||||
- The main schema fix was on Giant collection, which was missing retailer/provenance/audit fields that Costco collection already carried.
|
||||
|
||||
* [ ] t1.14.1: refactor retailer normalization into the new normalized_items schema (3-5 commits)
|
||||
* [X] t1.14.1: refactor retailer normalization into the new normalized_items schema (3-5 commits)
|
||||
make Giant and Costco emit the shared normalized line-item schema without introducing cross-retailer identity logic
|
||||
|
||||
** Acceptance Criteria
|
||||
@@ -538,10 +538,13 @@ make Giant and Costco emit the shared normalized line-item schema without introd
|
||||
- pm note: normalized_item_id is the only retailer-level grouping identity; do not introduce observed_products or a second grouping artifact
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- datetime:
|
||||
- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python -m unittest tests.test_enrich_giant tests.test_costco_pipeline tests.test_purchases`; `./venv/bin/python normalize_giant_web.py --help`; `./venv/bin/python normalize_costco_web.py --help`; `./venv/bin/python enrich_giant.py --help`; `./venv/bin/python enrich_costco.py --help`
|
||||
- datetime: 2026-03-18
|
||||
|
||||
** notes
|
||||
- Kept the existing Giant and Costco parsing logic intact and added the new normalized schema fields in place, rather than rewriting the enrichers from scratch.
|
||||
- `normalized_item_id` is always present, but it only collapses repeated rows when the evidence is strong; otherwise it falls back to row-level identity via `normalized_row_id`.
|
||||
- Added `normalize_*` entry points for the new data-model layout while leaving the legacy `enrich_*` commands available during the transition.
|
||||
|
||||
* [ ] t1.15: refactor review/combine pipeline around normalized_item_id and catalog links (4-8 commits)
|
||||
replace the old observed/canonical workflow with a review-first pipeline that uses normalized_item_id as the retailer-level review unit and links it to catalog items
|
||||
|
||||
Reference in New Issue
Block a user