* [X] t1.1: harden giant receipt fetch cli (2-4 commits) ** acceptance criteria - giant scraper runs from cli with prompts or env-backed defaults for `user_id` and `loyalty` - script reuses current browser session via firefox cookies + `curl_cffi` - script only fetches unseen orders - script appends to `orders.csv` and `items.csv` without duplicating prior visits - script prints a note that giant only exposes the most recent 50 visits ** notes - keep this giant-specific - no canonical product logic here - raw json archive remains source of truth ** evidence - commit: `d57b9cf` on branch `cx` - tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scraper.py --help`; verified `.env` loading via `scraper.load_config()` - date: 2026-03-14 * [X] t1.2: define grocery data model and file layout (1-2 commits) ** acceptance criteria - decide and document the files/directories for: - retailer raw exports - enriched line items - observed products - canonical products - product links - define stable column schemas for each file - explicitly separate retailer-specific parsing from cross-retailer canonicalization ** notes - this is the guardrail task so we don't make giant-specific hacks the system of record - keep schema minimal but extensible ** evidence - commit: - tests: reviewed `giant_output/raw/history.json`, one sample raw order json, `giant_output/orders.csv`, `giant_output/items.csv`; documented schemas in `pm/data-model.org` - date: 2026-03-15 * [ ] t1.3: build giant parser/enricher from raw json (2-4 commits) ** acceptance criteria - parser reads giant raw order json files - outputs `items_enriched.csv` - preserves core raw values plus parsed fields such as: - normalized item name - image url - size value/unit guesses - pack/count guesses - fee/store-brand flags - per-unit/per-weight derived price where possible - parser is deterministic and rerunnable ** notes - do not attempt canonical cross-store matching yet - parser should preserve ambiguity rather than hallucinating precision ** evidence - commit: - tests: - date: * [ ] t1.4: generate observed-product layer from enriched items (2-3 commits) ** acceptance criteria - distinct observed products are generated from enriched giant items - each observed product has a stable `observed_product_id` - observed products aggregate: - first seen / last seen - times seen - representative upc - representative image url - representative normalized name - outputs `products_observed.csv` ** notes - observed product is retailer-facing, not yet canonical - likely key is some combo of retailer + upc + normalized name ** evidence - commit: - tests: - date: * [ ] t1.5: build review queue for unresolved or low-confidence products (1-3 commits) ** acceptance criteria - produce a review file containing observed products needing manual review - include enough context to review quickly: - raw names - parsed names - upc - image url - example prices - seen count - reviewed status can be stored and reused ** notes - this is where human-in-the-loop starts - optimize for “approve once, remember forever” ** evidence - commit: - tests: - date: * [ ] t1.6: create canonical product layer and observed→canonical links (2-4 commits) ** acceptance criteria - define and create `products_canonical.csv` - define and create `product_links.csv` - support linking one or more observed products to one canonical product - canonical product schema supports food-cost comparison fields such as: - product type - variant - size - measure type - normalized quantity basis ** notes - this is the first cross-retailer abstraction layer - do not require llm assistance for v1 ** evidence - commit: - tests: - date: * [ ] t1.7: implement auto-link rules for easy matches (2-3 commits) ** acceptance criteria - auto-link can match observed products to canonical products using deterministic rules - rules include at least: - exact upc - exact normalized name - exact size/unit match where available - low-confidence cases remain unlinked for review ** notes - keep the rules conservative - false positives are worse than unresolved items ** evidence - commit: - tests: - date: * [ ] t1.8: support costco raw ingest path (2-5 commits) ** acceptance criteria - add a costco-specific raw ingest/export path - output costco line items into the same shared raw/enriched schema family - confirm at least one product class can exist as: - giant observed product - costco observed product - one shared canonical product ** notes - this is the proof that the architecture generalizes - don’t chase perfection before the second retailer lands ** evidence - commit: - tests: - date: * [ ] t1.9: compute normalized comparison metrics (2-3 commits) ** acceptance criteria - derive normalized comparison fields where possible: - price per lb - price per oz - price per each - price per count - metrics are attached at canonical or linked-observed level as appropriate - emit obvious nulls when basis is unknown rather than inventing values ** notes - this is where “gala apples 5 lb bag vs other gala apples” becomes possible - units discipline matters a lot here ** evidence - commit: - tests: - date: * [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits) ** acceptance criteria - llm suggestions are generated only for unresolved observed products - llm outputs are stored as suggestions, not auto-applied truth - reviewer can approve/edit/reject suggestions - approved decisions are persisted into canonical/link files ** notes - bounded assistant, not autonomous goblin - image urls may become useful here ** evidence - commit: - tests: - date: