added pm folder and tasks
This commit is contained in:
200
pm/tasks.org
Normal file
200
pm/tasks.org
Normal file
@@ -0,0 +1,200 @@
|
||||
* [ ] t1.1: harden giant receipt fetch cli (2-4 commits)
|
||||
** acceptance criteria
|
||||
- giant scraper runs from cli with prompts or env-backed defaults for `user_id` and `loyalty`
|
||||
- script reuses current browser session via firefox cookies + `curl_cffi`
|
||||
- script only fetches unseen orders
|
||||
- script appends to `orders.csv` and `items.csv` without duplicating prior visits
|
||||
- script prints a note that giant only exposes the most recent 50 visits
|
||||
|
||||
** notes
|
||||
- keep this giant-specific
|
||||
- no canonical product logic here
|
||||
- raw json archive remains source of truth
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
|
||||
* [ ] t1.2: define grocery data model and file layout (1-2 commits)
|
||||
** acceptance criteria
|
||||
- decide and document the files/directories for:
|
||||
- retailer raw exports
|
||||
- enriched line items
|
||||
- observed products
|
||||
- canonical products
|
||||
- product links
|
||||
- define stable column schemas for each file
|
||||
- explicitly separate retailer-specific parsing from cross-retailer canonicalization
|
||||
|
||||
** notes
|
||||
- this is the guardrail task so we don’t make giant-specific hacks the system of record
|
||||
- keep schema minimal but extensible
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
|
||||
* [ ] t1.3: build giant parser/enricher from raw json (2-4 commits)
|
||||
** acceptance criteria
|
||||
- parser reads giant raw order json files
|
||||
- outputs `items_enriched.csv`
|
||||
- preserves core raw values plus parsed fields such as:
|
||||
- normalized item name
|
||||
- image url
|
||||
- size value/unit guesses
|
||||
- pack/count guesses
|
||||
- fee/store-brand flags
|
||||
- per-unit/per-weight derived price where possible
|
||||
- parser is deterministic and rerunnable
|
||||
|
||||
** notes
|
||||
- do not attempt canonical cross-store matching yet
|
||||
- parser should preserve ambiguity rather than hallucinating precision
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
|
||||
* [ ] t1.4: generate observed-product layer from enriched items (2-3 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- distinct observed products are generated from enriched giant items
|
||||
- each observed product has a stable `observed_product_id`
|
||||
- observed products aggregate:
|
||||
- first seen / last seen
|
||||
- times seen
|
||||
- representative upc
|
||||
- representative image url
|
||||
- representative normalized name
|
||||
- outputs `products_observed.csv`
|
||||
|
||||
** notes
|
||||
- observed product is retailer-facing, not yet canonical
|
||||
- likely key is some combo of retailer + upc + normalized name
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
|
||||
* [ ] t1.5: build review queue for unresolved or low-confidence products (1-3 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- produce a review file containing observed products needing manual review
|
||||
- include enough context to review quickly:
|
||||
- raw names
|
||||
- parsed names
|
||||
- upc
|
||||
- image url
|
||||
- example prices
|
||||
- seen count
|
||||
- reviewed status can be stored and reused
|
||||
|
||||
** notes
|
||||
- this is where human-in-the-loop starts
|
||||
- optimize for “approve once, remember forever”
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
|
||||
* [ ] t1.6: create canonical product layer and observed→canonical links (2-4 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- define and create `products_canonical.csv`
|
||||
- define and create `product_links.csv`
|
||||
- support linking one or more observed products to one canonical product
|
||||
- canonical product schema supports food-cost comparison fields such as:
|
||||
- product type
|
||||
- variant
|
||||
- size
|
||||
- measure type
|
||||
- normalized quantity basis
|
||||
|
||||
** notes
|
||||
- this is the first cross-retailer abstraction layer
|
||||
- do not require llm assistance for v1
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
|
||||
* [ ] t1.7: implement auto-link rules for easy matches (2-3 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- auto-link can match observed products to canonical products using deterministic rules
|
||||
- rules include at least:
|
||||
- exact upc
|
||||
- exact normalized name
|
||||
- exact size/unit match where available
|
||||
- low-confidence cases remain unlinked for review
|
||||
|
||||
** notes
|
||||
- keep the rules conservative
|
||||
- false positives are worse than unresolved items
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
|
||||
* [ ] t1.8: support costco raw ingest path (2-5 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- add a costco-specific raw ingest/export path
|
||||
- output costco line items into the same shared raw/enriched schema family
|
||||
- confirm at least one product class can exist as:
|
||||
- giant observed product
|
||||
- costco observed product
|
||||
- one shared canonical product
|
||||
|
||||
** notes
|
||||
- this is the proof that the architecture generalizes
|
||||
- don’t chase perfection before the second retailer lands
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
|
||||
* [ ] t1.9: compute normalized comparison metrics (2-3 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- derive normalized comparison fields where possible:
|
||||
- price per lb
|
||||
- price per oz
|
||||
- price per each
|
||||
- price per count
|
||||
- metrics are attached at canonical or linked-observed level as appropriate
|
||||
- emit obvious nulls when basis is unknown rather than inventing values
|
||||
|
||||
** notes
|
||||
- this is where “gala apples 5 lb bag vs other gala apples” becomes possible
|
||||
- units discipline matters a lot here
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
|
||||
* [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- llm suggestions are generated only for unresolved observed products
|
||||
- llm outputs are stored as suggestions, not auto-applied truth
|
||||
- reviewer can approve/edit/reject suggestions
|
||||
- approved decisions are persisted into canonical/link files
|
||||
|
||||
** notes
|
||||
- bounded assistant, not autonomous goblin
|
||||
- image urls may become useful here
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
Reference in New Issue
Block a user