added pm folder and tasks

2026-03-14 17:59:40 -04:00
parent 1df2dcec4b
commit 585d8c1e49
3 changed files with 307 additions and 1 deletions
--- a/pm/tasks.org
+++ b/pm/tasks.org
@@ -0,0 +1,200 @@
+* [ ] t1.1: harden giant receipt fetch cli (2-4 commits)
+** acceptance criteria
+- giant scraper runs from cli with prompts or env-backed defaults for `user_id` and `loyalty`
+- script reuses current browser session via firefox cookies + `curl_cffi`
+- script only fetches unseen orders
+- script appends to `orders.csv` and `items.csv` without duplicating prior visits
+- script prints a note that giant only exposes the most recent 50 visits
+
+** notes
+- keep this giant-specific
+- no canonical product logic here
+- raw json archive remains source of truth
+
+** evidence
+- commit:
+- tests:
+- date:
+
+* [ ] t1.2: define grocery data model and file layout (1-2 commits)
+** acceptance criteria
+- decide and document the files/directories for:
+  - retailer raw exports
+  - enriched line items
+  - observed products
+  - canonical products
+  - product links
+- define stable column schemas for each file
+- explicitly separate retailer-specific parsing from cross-retailer canonicalization
+
+** notes
+- this is the guardrail task so we don’t make giant-specific hacks the system of record
+- keep schema minimal but extensible
+
+** evidence
+- commit:
+- tests:
+- date:
+
+* [ ] t1.3: build giant parser/enricher from raw json (2-4 commits)
+** acceptance criteria
+- parser reads giant raw order json files
+- outputs `items_enriched.csv`
+- preserves core raw values plus parsed fields such as:
+  - normalized item name
+  - image url
+  - size value/unit guesses
+  - pack/count guesses
+  - fee/store-brand flags
+  - per-unit/per-weight derived price where possible
+- parser is deterministic and rerunnable
+
+** notes
+- do not attempt canonical cross-store matching yet
+- parser should preserve ambiguity rather than hallucinating precision
+
+** evidence
+- commit:
+- tests:
+- date:
+
+* [ ] t1.4: generate observed-product layer from enriched items (2-3 commits)
+
+** acceptance criteria
+- distinct observed products are generated from enriched giant items
+- each observed product has a stable `observed_product_id`
+- observed products aggregate:
+  - first seen / last seen
+  - times seen
+  - representative upc
+  - representative image url
+  - representative normalized name
+- outputs `products_observed.csv`
+
+** notes
+- observed product is retailer-facing, not yet canonical
+- likely key is some combo of retailer + upc + normalized name
+
+** evidence
+- commit:
+- tests:
+- date:
+
+* [ ] t1.5: build review queue for unresolved or low-confidence products (1-3 commits)
+
+** acceptance criteria
+- produce a review file containing observed products needing manual review
+- include enough context to review quickly:
+  - raw names
+  - parsed names
+  - upc
+  - image url
+  - example prices
+  - seen count
+- reviewed status can be stored and reused
+
+** notes
+- this is where human-in-the-loop starts
+- optimize for “approve once, remember forever”
+
+** evidence
+- commit:
+- tests:
+- date:
+
+* [ ] t1.6: create canonical product layer and observed→canonical links (2-4 commits)
+
+** acceptance criteria
+- define and create `products_canonical.csv`
+- define and create `product_links.csv`
+- support linking one or more observed products to one canonical product
+- canonical product schema supports food-cost comparison fields such as:
+  - product type
+  - variant
+  - size
+  - measure type
+  - normalized quantity basis
+
+** notes
+- this is the first cross-retailer abstraction layer
+- do not require llm assistance for v1
+
+** evidence
+- commit:
+- tests:
+- date:
+
+* [ ] t1.7: implement auto-link rules for easy matches (2-3 commits)
+
+** acceptance criteria
+- auto-link can match observed products to canonical products using deterministic rules
+- rules include at least:
+  - exact upc
+  - exact normalized name
+  - exact size/unit match where available
+- low-confidence cases remain unlinked for review
+
+** notes
+- keep the rules conservative
+- false positives are worse than unresolved items
+
+** evidence
+- commit:
+- tests:
+- date:
+
+* [ ] t1.8: support costco raw ingest path (2-5 commits)
+
+** acceptance criteria
+- add a costco-specific raw ingest/export path
+- output costco line items into the same shared raw/enriched schema family
+- confirm at least one product class can exist as:
+  - giant observed product
+  - costco observed product
+  - one shared canonical product
+
+** notes
+- this is the proof that the architecture generalizes
+- don’t chase perfection before the second retailer lands
+
+** evidence
+- commit:
+- tests:
+- date:
+
+* [ ] t1.9: compute normalized comparison metrics (2-3 commits)
+
+** acceptance criteria
+- derive normalized comparison fields where possible:
+  - price per lb
+  - price per oz
+  - price per each
+  - price per count
+- metrics are attached at canonical or linked-observed level as appropriate
+- emit obvious nulls when basis is unknown rather than inventing values
+
+** notes
+- this is where “gala apples 5 lb bag vs other gala apples” becomes possible
+- units discipline matters a lot here
+
+** evidence
+- commit:
+- tests:
+- date:
+
+* [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits)
+
+** acceptance criteria
+- llm suggestions are generated only for unresolved observed products
+- llm outputs are stored as suggestions, not auto-applied truth
+- reviewer can approve/edit/reject suggestions
+- approved decisions are persisted into canonical/link files
+
+** notes
+- bounded assistant, not autonomous goblin
+- image urls may become useful here
+
+** evidence
+- commit:
+- tests:
+- date: