Record t1.8.7 and t1.9 task evidence

2026-03-16 18:01:16 -04:00
parent be1bf6328e
commit 34eedff9c5
1 changed files with 45 additions and 8 deletions
--- a/pm/tasks.org
+++ b/pm/tasks.org
@@ -276,7 +276,7 @@
 - commit: `7789c2e` on branch `cx`
 - tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scrape_giant.py --help`; `./venv/bin/python scrape_costco.py --help`; verified Firefox storage token extraction and locked-db copy behavior in unit tests
 - date: 2026-03-16
-* [ ] t1.8.7: simplify costco session bootstrap and remove over-abstraction (2-4 commits)
+* [X] t1.8.7: simplify costco session bootstrap and remove over-abstraction (2-4 commits)

 ** acceptance criteria
 - make `scrape_costco.py` readable end-to-end without tracing through multiple partial bootstrap layers
@@ -302,12 +302,23 @@
 - no new heuristics in this task

 ** evidence
- commit:
- tests:
- date:  
-* [ ] t1.9: compute normalized comparison metrics (2-4 commits)
+- commit: `d7a0329` on branch `cx`
+- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scrape_costco.py --help`; verified explicit Costco session bootstrap flow in `scrape_costco.py` and low-level-only browser access in `browser_session.py`
+- date: 2026-03-16
+* [X] t1.9: build pivot-ready normalized purchase log and comparison metrics (2-4 commits)

 ** acceptance criteria
+- produce a flat `purchases.csv` suitable for excel pivot tables and pivot charts
+- each purchase row preserves:
+  - purchase date
+  - retailer
+  - order id
+  - raw item name
+  - normalized item name
+  - canonical item id when resolved
+  - quantity / unit
+  - line total
+  - store/location info where available
 - derive normalized comparison fields where possible on enriched or observed product rows:
  - `price_per_lb`
  - `price_per_oz`
@@ -318,17 +329,19 @@
  - receipt weight
  - explicit count/pack
 - emit nulls when basis is unknown, conflicting, or ambiguous
+- support pivot-friendly analysis of purchase frequency and item cost over time
 - document at least one Giant vs Costco comparison example using the normalized metrics

 ** notes
 - compute metrics as close to the raw observation as possible
 - canonical layer can aggregate later, but should not invent missing unit economics
 - unit discipline matters more than coverage
+- raw item name must be retained for audit/debugging

 ** evidence
- commit:
- tests:
- date:
+- commit: `be1bf63` on branch `cx`
+- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python build_purchases.py`; verified `combined_output/purchases.csv` and `combined_output/comparison_examples.csv` on the current Giant + Costco dataset
+- date: 2026-03-16

 * [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits)

@@ -346,3 +359,27 @@
 - commit:
 - tests:
 - date:
+* [ ] t1.11: define review and item-resolution workflow for unresolved products (2-3 commits)
+
+** acceptance criteria
+- define the persistent files used to resolve unknown items, including:
+  - review queue
+  - canonical item catalog
+  - alias / mapping layer if separate
+- specify how unresolved items move from `review_queue.csv` into the final normalized purchase log
+- define the manual resolution workflow, including:
+  - what the human edits
+  - what script is rerun afterward
+  - how resolved mappings are persisted for future runs
+- ensure resolved items are positively identified into stable canonical item ids rather than one-off text substitutions
+- document how raw item name, normalized item name, and canonical item id are all retained
+
+** notes
+- goal is “approve once, reuse forever”
+- keep the workflow simple and auditable
+- manual review is fine; the important part is making it durable and rerunnable
+
+** evidence
+- commit:
+- tests:
+- date: