Record t1.8.7 and t1.9 task evidence

This commit is contained in:
ben
2026-03-16 18:01:16 -04:00
parent be1bf6328e
commit 34eedff9c5

View File

@@ -276,7 +276,7 @@
- commit: `7789c2e` on branch `cx` - commit: `7789c2e` on branch `cx`
- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scrape_giant.py --help`; `./venv/bin/python scrape_costco.py --help`; verified Firefox storage token extraction and locked-db copy behavior in unit tests - tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scrape_giant.py --help`; `./venv/bin/python scrape_costco.py --help`; verified Firefox storage token extraction and locked-db copy behavior in unit tests
- date: 2026-03-16 - date: 2026-03-16
* [ ] t1.8.7: simplify costco session bootstrap and remove over-abstraction (2-4 commits) * [X] t1.8.7: simplify costco session bootstrap and remove over-abstraction (2-4 commits)
** acceptance criteria ** acceptance criteria
- make `scrape_costco.py` readable end-to-end without tracing through multiple partial bootstrap layers - make `scrape_costco.py` readable end-to-end without tracing through multiple partial bootstrap layers
@@ -302,12 +302,23 @@
- no new heuristics in this task - no new heuristics in this task
** evidence ** evidence
- commit: - commit: `d7a0329` on branch `cx`
- tests: - tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scrape_costco.py --help`; verified explicit Costco session bootstrap flow in `scrape_costco.py` and low-level-only browser access in `browser_session.py`
- date: - date: 2026-03-16
* [ ] t1.9: compute normalized comparison metrics (2-4 commits) * [X] t1.9: build pivot-ready normalized purchase log and comparison metrics (2-4 commits)
** acceptance criteria ** acceptance criteria
- produce a flat `purchases.csv` suitable for excel pivot tables and pivot charts
- each purchase row preserves:
- purchase date
- retailer
- order id
- raw item name
- normalized item name
- canonical item id when resolved
- quantity / unit
- line total
- store/location info where available
- derive normalized comparison fields where possible on enriched or observed product rows: - derive normalized comparison fields where possible on enriched or observed product rows:
- `price_per_lb` - `price_per_lb`
- `price_per_oz` - `price_per_oz`
@@ -318,17 +329,19 @@
- receipt weight - receipt weight
- explicit count/pack - explicit count/pack
- emit nulls when basis is unknown, conflicting, or ambiguous - emit nulls when basis is unknown, conflicting, or ambiguous
- support pivot-friendly analysis of purchase frequency and item cost over time
- document at least one Giant vs Costco comparison example using the normalized metrics - document at least one Giant vs Costco comparison example using the normalized metrics
** notes ** notes
- compute metrics as close to the raw observation as possible - compute metrics as close to the raw observation as possible
- canonical layer can aggregate later, but should not invent missing unit economics - canonical layer can aggregate later, but should not invent missing unit economics
- unit discipline matters more than coverage - unit discipline matters more than coverage
- raw item name must be retained for audit/debugging
** evidence ** evidence
- commit: - commit: `be1bf63` on branch `cx`
- tests: - tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python build_purchases.py`; verified `combined_output/purchases.csv` and `combined_output/comparison_examples.csv` on the current Giant + Costco dataset
- date: - date: 2026-03-16
* [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits) * [ ] t1.10: add optional llm-assisted suggestion workflow for unresolved products (2-4 commits)
@@ -346,3 +359,27 @@
- commit: - commit:
- tests: - tests:
- date: - date:
* [ ] t1.11: define review and item-resolution workflow for unresolved products (2-3 commits)
** acceptance criteria
- define the persistent files used to resolve unknown items, including:
- review queue
- canonical item catalog
- alias / mapping layer if separate
- specify how unresolved items move from `review_queue.csv` into the final normalized purchase log
- define the manual resolution workflow, including:
- what the human edits
- what script is rerun afterward
- how resolved mappings are persisted for future runs
- ensure resolved items are positively identified into stable canonical item ids rather than one-off text substitutions
- document how raw item name, normalized item name, and canonical item id are all retained
** notes
- goal is “approve once, reuse forever”
- keep the workflow simple and auditable
- manual review is fine; the important part is making it durable and rerunnable
** evidence
- commit:
- tests:
- date: