Finalize post-refactor layout and remove old pipeline files
This commit is contained in:
@@ -110,8 +110,15 @@ data/
|
||||
review/
|
||||
review_queue.csv # Human review queue for unresolved matching/parsing cases.
|
||||
product_links.csv # Links from normalized retailer items to catalog items.
|
||||
catalog.csv # Cross-retailer product catalog entities used for comparison.
|
||||
purchases.csv
|
||||
catalog.csv # Cross-retailer product catalog entities used for comparison.
|
||||
analysis/
|
||||
purchases.csv
|
||||
comparison_examples.csv
|
||||
item_price_over_time.csv
|
||||
spend_by_visit.csv
|
||||
items_per_visit.csv
|
||||
category_spend_over_time.csv
|
||||
retailer_store_breakdown.csv
|
||||
#+end_example
|
||||
|
||||
Notes:
|
||||
@@ -223,7 +230,7 @@ Notes:
|
||||
- Valid `normalization_basis` values should be explicit, e.g. `exact_upc`, `exact_retailer_item_id`, `exact_name_size_pack`, or `approved_retailer_alias`.
|
||||
- Do not use fuzzy or semantic matching to assign `normalized_item_id`.
|
||||
- Discount/coupon rows may remain as standalone normalized rows for auditability even when their amounts are attached to a purchased row via `matched_discount_amount`.
|
||||
- Cross-retailer identity is handled later in review/combine via `catalog.csv` and `product_links.csv`.
|
||||
- Cross-retailer identity is handled later in review/combine via `data/review/catalog.csv` and `product_links.csv`.
|
||||
|
||||
** `data/review/product_links.csv`
|
||||
One row per review-approved link from a normalized retailer item to a catalog item.
|
||||
@@ -263,7 +270,7 @@ One row per issue needing human review.
|
||||
| `resolution_notes` | reviewer notes |
|
||||
| `created_at` | creation timestamp or date |
|
||||
| `updated_at` | last update timestamp or date |
|
||||
** `data/catalog.csv`
|
||||
** `data/review/catalog.csv`
|
||||
One row per cross-retailer catalog product.
|
||||
| key | definition |
|
||||
|----------------------------+----------------------------------------|
|
||||
@@ -288,7 +295,7 @@ Notes:
|
||||
- Do not encode packaging/count into `catalog_name` unless it is essential to product identity.
|
||||
- `catalog_name` should come from review-approved naming, not raw retailer strings.
|
||||
|
||||
** `data/purchases.csv`
|
||||
** `data/analysis/purchases.csv`
|
||||
One row per purchased item (i.e., `is_item`==true from normalized layer), with
|
||||
catalog attributes denormalized in and discounts already applied.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user