Record t1.8.7 and t1.9 task evidence

2026-03-16 18:01:16 -04:00
parent be1bf6328e
commit 34eedff9c5
1 changed files with 45 additions and 8 deletions
--- a/pm/tasks.org
+++ b/pm/tasks.org
@@ -276,7 +276,7 @@
 - commit: `7789c2e` on branch `cx`
 - tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scrape_giant.py --help`; `./venv/bin/python scrape_costco.py --help`; verified Firefox storage token extraction and locked-db copy behavior in unit tests
 - date: 2026-03-16
-* [ ] t1.8.7: simplify costco session bootstrap and remove over-abstraction (2-4 commits)
+* [X] t1.8.7: simplify costco session bootstrap and remove over-abstraction (2-4 commits)
 ** acceptance criteria
 - make `scrape_costco.py` readable end-to-end without tracing through multiple partial bootstrap layers
@@ -302,12 +302,23 @@
 - no new heuristics in this task
 ** evidence
- commit:
+- commit: `d7a0329` on branch `cx`
- tests:
+- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scrape_costco.py --help`; verified explicit Costco session bootstrap flow in `scrape_costco.py` and low-level-only browser access in `browser_session.py`
- date:  
+- date: 2026-03-16
-* [ ] t1.9: compute normalized comparison metrics (2-4 commits)
+* [X] t1.9: build pivot-ready normalized purchase log and comparison metrics (2-4 commits)
 ** acceptance criteria
 - produce a flat `purchases.csv` suitable for excel pivot tables and pivot charts
 - each purchase row preserves:
  - purchase date
  - retailer
  - order id
  - raw item name
  - normalized item name
  - canonical item id when resolved
  - quantity / unit
  - line total
  - store/location info where available
 - derive normalized comparison fields where possible on enriched or observed product rows:
  - `price_per_lb`
  - `price_per_oz`
@@ -318,17 +329,19 @@
  - receipt weight
  - explicit count/pack
 - emit nulls when basis is unknown, conflicting, or ambiguous
 - support pivot-friendly analysis of purchase frequency and item cost over time
 - document at least one Giant vs Costco comparison example using the normalized metrics
 ** notes
 - compute metrics as close to the raw observation as possible
 - canonical layer can aggregate later, but should not invent missing unit economics
 - unit discipline matters more than coverage
 - raw item name must be retained for audit/debugging
 ** evidence
- commit:
+- commit: `be1bf63` on branch `cx`
- tests:
+- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python build_purchases.py`; verified `combined_output/purchases.csv` and `combined_output/comparison_examples.csv` on the current Giant + Costco dataset
- date:
+- date: 2026-03-16
 * [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits)
@@ -346,3 +359,27 @@
 - commit:
 - tests:
 - date:
 * [ ] t1.11: define review and item-resolution workflow for unresolved products (2-3 commits)
 ** acceptance criteria
 - define the persistent files used to resolve unknown items, including:
  - review queue
  - canonical item catalog
  - alias / mapping layer if separate
 - specify how unresolved items move from `review_queue.csv` into the final normalized purchase log
 - define the manual resolution workflow, including:
  - what the human edits
  - what script is rerun afterward
  - how resolved mappings are persisted for future runs
 - ensure resolved items are positively identified into stable canonical item ids rather than one-off text substitutions
 - document how raw item name, normalized item name, and canonical item id are all retained
 ** notes
 - goal is “approve once, reuse forever”
 - keep the workflow simple and auditable
 - manual review is fine; the important part is making it durable and rerunnable
 ** evidence
 - commit:
 - tests:
 - date: