74 lines
2.4 KiB
Org Mode
74 lines
2.4 KiB
Org Mode
* review and item-resolution workflow
|
|
|
|
This document defines the durable review workflow for unresolved observed
|
|
products.
|
|
|
|
** persistent files
|
|
|
|
- `combined_output/purchases.csv`
|
|
Flat normalized purchase log. This is the review input because it retains:
|
|
- raw item name
|
|
- normalized item name
|
|
- observed product id
|
|
- canonical product id when resolved
|
|
- retailer/order/date/price context
|
|
- `combined_output/review_queue.csv`
|
|
Current unresolved observed products grouped for review.
|
|
- `combined_output/review_resolutions.csv`
|
|
Durable mapping decisions from observed products to canonical products.
|
|
- `combined_output/canonical_catalog.csv`
|
|
Durable canonical item catalog used by manual review and later purchase-log
|
|
rebuilds.
|
|
|
|
There is no separate alias file in v1. `review_resolutions.csv` is the mapping
|
|
layer from observed products to canonical product ids.
|
|
|
|
** workflow
|
|
|
|
1. Run `build_purchases.py`
|
|
This refreshes the purchase log and seeds/updates the canonical catalog from
|
|
current auto-linked canonical rows.
|
|
2. Run `review_products.py`
|
|
This rebuilds `review_queue.csv` from unresolved purchase rows and prompts in
|
|
the terminal for one observed product at a time.
|
|
3. Choose one of:
|
|
- link to existing canonical
|
|
- create new canonical
|
|
- exclude
|
|
- skip
|
|
4. `review_products.py` writes decisions immediately to:
|
|
- `review_resolutions.csv`
|
|
- `canonical_catalog.csv` when a new canonical item is created
|
|
5. Rerun `build_purchases.py`
|
|
This reapplies approved resolutions so the final normalized purchase log now
|
|
carries the reviewed `canonical_product_id`.
|
|
|
|
** what the human edits
|
|
|
|
The primary interface is terminal prompts in `review_products.py`.
|
|
|
|
The human provides:
|
|
- existing canonical id when linking
|
|
- canonical name/category/product type when creating a new canonical item
|
|
- optional resolution notes
|
|
|
|
The generated CSVs remain editable by hand if needed, but the intended workflow
|
|
is terminal-first.
|
|
|
|
** durability
|
|
|
|
- Resolutions are keyed by `observed_product_id`, not by one-off text
|
|
substitution.
|
|
- Canonical products are keyed by stable `canonical_product_id`.
|
|
- Future runs reuse approved mappings through `review_resolutions.csv`.
|
|
|
|
** retention of audit fields
|
|
|
|
The final `purchases.csv` retains:
|
|
- `raw_item_name`
|
|
- `normalized_item_name`
|
|
- `canonical_product_id`
|
|
|
|
This preserves the raw receipt description, the deterministic parser output, and
|
|
the human-approved canonical identity in one flat purchase log.
|