Refactor retailer collection entrypoints

This commit is contained in:
ben
2026-03-18 15:18:47 -04:00
parent e74253f6fb
commit 48c6eaf753
6 changed files with 179 additions and 11 deletions

View File

@@ -472,7 +472,7 @@ refactor canonical generation so product identity is cleaner, duplicate canonica
** notes
- Removed weak exact-name auto-canonical creation so ambiguous products stay in review instead of generating junk canonicals.
- Canonical display names are now cleaned of obvious punctuation and packaging noise, but I kept the cleanup conservative rather than adding a broad fuzzy merge layer.
* [ ] t1.14: refactor retailer collection into the new data model (2-4 commits)
* [X] t1.14: refactor retailer collection into the new data model (2-4 commits)
move Giant and Costco collection into the new collect structure and make both retailers emit the same collected schemas
** Acceptance Criteria
@@ -494,10 +494,13 @@ move Giant and Costco collection into the new collect structure and make both re
** evidence
- commit:
- tests:
- datetime:
- tests: `./venv/bin/python -m unittest tests.test_scraper tests.test_costco_pipeline tests.test_browser_session`; `./venv/bin/python collect_giant_web.py --help`; `./venv/bin/python collect_costco_web.py --help`; `./venv/bin/python scrape_giant.py --help`; `./venv/bin/python scrape_costco.py --help`
- datetime: 2026-03-18
** notes
- Kept this as a path/schema move, not a parsing rewrite: the existing Giant and Costco collection behavior remains in place behind new `collect_*` entry points.
- Added lightweight deprecation nudges on the legacy `scrape_*` commands rather than removing them immediately, so the move is inspectable and low-risk.
- The main schema fix was on Giant collection, which was missing retailer/provenance/audit fields that Costco collection already carried.
* [ ] t1.14.1: refactor retailer normalization into the new normalized_items schema (3-5 commits)
make Giant and Costco emit the shared normalized line-item schema without introducing cross-retailer identity logic