added 14.2 and 14.3 for refactor prep
This commit is contained in:
73
pm/tasks.org
73
pm/tasks.org
@@ -546,6 +546,78 @@ make Giant and Costco emit the shared normalized line-item schema without introd
|
||||
- `normalized_item_id` is always present, but it only collapses repeated rows when the evidence is strong; otherwise it falls back to row-level identity via `normalized_row_id`.
|
||||
- Added `normalize_*` entry points for the new data-model layout while leaving the legacy `enrich_*` commands available during the transition.
|
||||
|
||||
* [ ] t1.14.2: finalize filesystem and schema alignment for the refactor (2-4 commits)
|
||||
bring on-disk outputs fully into the target `data/` structure without changing retailer behavior
|
||||
|
||||
** Acceptance Criteria
|
||||
1. retailer data directories conform to pm/data-model.org:
|
||||
- `data/giant-web/raw/...`
|
||||
- `data/giant-web/collected_orders.csv`
|
||||
- `data/giant-web/collected_items.csv`
|
||||
- `data/giant-web/normalized_items.csv`
|
||||
- `data/costco-web/raw/...`
|
||||
- `data/costco-web/collected_orders.csv`
|
||||
- `data/costco-web/collected_items.csv`
|
||||
- `data/costco-web/normalized_items.csv`
|
||||
2. review/combine outputs are moved or rewritten into the target review paths:
|
||||
- `data/review/review_queue.csv`
|
||||
- `data/review/product_links.csv`
|
||||
- `data/review/review_resolutions.csv`
|
||||
- `data/review/purchases.csv`
|
||||
- `data/review/pipeline_status.csv`
|
||||
- `data/review/pipeline_status.json`
|
||||
3. old transitional output paths are either:
|
||||
- removed from active script defaults, or
|
||||
- left as explicit compatibility shims with clear deprecation notes
|
||||
4. no recollection is required if existing raw files and collected csvs can be moved/copied losslessly into the new structure
|
||||
5. no schema information is lost during the move:
|
||||
- raw paths still resolve
|
||||
- collected/normalized csvs still open with the expected headers
|
||||
6. README and task/docs reflect the final active paths
|
||||
- pm note: prefer moving/adapting existing files over recollecting from retailers unless a real data loss or schema mismatch forces recollection
|
||||
- pm note: this is a structure-alignment task, not a retailer parsing task
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- datetime:
|
||||
|
||||
** notes
|
||||
|
||||
* [ ] t1.14.3: retailer-specific Costco normalization cleanup (2-4 commits)
|
||||
tighten Costco-specific normalization so normalized item names are cleaner and deterministic retailer grouping is less noisy
|
||||
|
||||
** Acceptance Criteria
|
||||
1. improve Costco item-name cleanup for obvious non-identity noise, such as:
|
||||
- trailing slash fragments
|
||||
- code tokens and receipt-format artifacts
|
||||
- duplicated measurement fragments already captured in structured fields
|
||||
2. preserve deterministic normalization rules only:
|
||||
- exact retailer_item_id
|
||||
- exact cleaned name + same size/pack when needed
|
||||
- approved retailer alias
|
||||
- no fuzzy or semantic matching
|
||||
3. normalized Costco names improve on known bad examples, e.g.:
|
||||
- `MANDARIN /` -> cleaner normalized item name
|
||||
- `LIFE 6'TABLE ... /` -> cleaner normalized item name
|
||||
4. cleanup does not overwrite retailer truth:
|
||||
- raw `item_name` is unchanged
|
||||
- parsed `size_value`, `size_unit`, `pack_qty`, and pricing fields remain intact
|
||||
5. discount-row behavior remains correct:
|
||||
- matched discount rows still populate `matched_discount_amount`
|
||||
- `net_line_total` remains correct
|
||||
- discount rows remain auditable
|
||||
6. add regression tests for the cleaned Costco examples and any new parsing rules
|
||||
- pm note: keep this explicitly Costco-specific; do not introduce a generic cleanup framework
|
||||
- pm note: prefer a short allowlist/blocklist of known receipt artifacts over broad heuristics
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- datetime:
|
||||
|
||||
** notes
|
||||
|
||||
* [ ] t1.15: refactor review/combine pipeline around normalized_item_id and catalog links (4-8 commits)
|
||||
replace the old observed/canonical workflow with a review-first pipeline that uses normalized_item_id as the retailer-level review unit and links it to catalog items
|
||||
|
||||
@@ -595,7 +667,6 @@ replace the old observed/canonical workflow with a review-first pipeline that us
|
||||
|
||||
** notes
|
||||
|
||||
|
||||
* [ ] 1t.10: add optional llm-assisted suggestion workflow for unresolved normalized retailer items (2-4 commits)
|
||||
|
||||
** acceptance criteria
|
||||
|
||||
Reference in New Issue
Block a user