diff --git a/pm/tasks.org b/pm/tasks.org index 8f1d10d..1910e62 100644 --- a/pm/tasks.org +++ b/pm/tasks.org @@ -276,7 +276,7 @@ - commit: `7789c2e` on branch `cx` - tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scrape_giant.py --help`; `./venv/bin/python scrape_costco.py --help`; verified Firefox storage token extraction and locked-db copy behavior in unit tests - date: 2026-03-16 -* [ ] t1.8.7: simplify costco session bootstrap and remove over-abstraction (2-4 commits) +* [X] t1.8.7: simplify costco session bootstrap and remove over-abstraction (2-4 commits) ** acceptance criteria - make `scrape_costco.py` readable end-to-end without tracing through multiple partial bootstrap layers @@ -302,12 +302,23 @@ - no new heuristics in this task ** evidence -- commit: -- tests: -- date: -* [ ] t1.9: compute normalized comparison metrics (2-4 commits) +- commit: `d7a0329` on branch `cx` +- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scrape_costco.py --help`; verified explicit Costco session bootstrap flow in `scrape_costco.py` and low-level-only browser access in `browser_session.py` +- date: 2026-03-16 +* [X] t1.9: build pivot-ready normalized purchase log and comparison metrics (2-4 commits) ** acceptance criteria +- produce a flat `purchases.csv` suitable for excel pivot tables and pivot charts +- each purchase row preserves: + - purchase date + - retailer + - order id + - raw item name + - normalized item name + - canonical item id when resolved + - quantity / unit + - line total + - store/location info where available - derive normalized comparison fields where possible on enriched or observed product rows: - `price_per_lb` - `price_per_oz` @@ -318,17 +329,19 @@ - receipt weight - explicit count/pack - emit nulls when basis is unknown, conflicting, or ambiguous +- support pivot-friendly analysis of purchase frequency and item cost over time - document at least one Giant vs Costco comparison example using the normalized metrics ** notes - compute metrics as close to the raw observation as possible - canonical layer can aggregate later, but should not invent missing unit economics - unit discipline matters more than coverage +- raw item name must be retained for audit/debugging ** evidence -- commit: -- tests: -- date: +- commit: `be1bf63` on branch `cx` +- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python build_purchases.py`; verified `combined_output/purchases.csv` and `combined_output/comparison_examples.csv` on the current Giant + Costco dataset +- date: 2026-03-16 * [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits) @@ -346,3 +359,27 @@ - commit: - tests: - date: +* [ ] t1.11: define review and item-resolution workflow for unresolved products (2-3 commits) + +** acceptance criteria +- define the persistent files used to resolve unknown items, including: + - review queue + - canonical item catalog + - alias / mapping layer if separate +- specify how unresolved items move from `review_queue.csv` into the final normalized purchase log +- define the manual resolution workflow, including: + - what the human edits + - what script is rerun afterward + - how resolved mappings are persisted for future runs +- ensure resolved items are positively identified into stable canonical item ids rather than one-off text substitutions +- document how raw item name, normalized item name, and canonical item id are all retained + +** notes +- goal is “approve once, reuse forever” +- keep the workflow simple and auditable +- manual review is fine; the important part is making it durable and rerunnable + +** evidence +- commit: +- tests: +- date: