Finalize post-refactor layout and remove old pipeline files

2026-03-24 17:09:57 -04:00
parent cdb7a15739
commit 09829b2b9d
17 changed files with 59 additions and 1154 deletions
--- a/pm/data-model.org
+++ b/pm/data-model.org
@@ -110,8 +110,15 @@ data/
  review/
    review_queue.csv # Human review queue for unresolved matching/parsing cases.
    product_links.csv # Links from normalized retailer items to catalog items.
-  catalog.csv  # Cross-retailer product catalog entities used for comparison.
-  purchases.csv
+    catalog.csv # Cross-retailer product catalog entities used for comparison.
+  analysis/
+    purchases.csv
+    comparison_examples.csv
+    item_price_over_time.csv
+    spend_by_visit.csv
+    items_per_visit.csv
+    category_spend_over_time.csv
+    retailer_store_breakdown.csv
 #+end_example

 Notes:
@@ -223,7 +230,7 @@ Notes:
 - Valid `normalization_basis` values should be explicit, e.g. `exact_upc`, `exact_retailer_item_id`, `exact_name_size_pack`, or `approved_retailer_alias`.
 - Do not use fuzzy or semantic matching to assign `normalized_item_id`.
 - Discount/coupon rows may remain as standalone normalized rows for auditability even when their amounts are attached to a purchased row via `matched_discount_amount`.
- Cross-retailer identity is handled later in review/combine via `catalog.csv` and `product_links.csv`.
+- Cross-retailer identity is handled later in review/combine via `data/review/catalog.csv` and `product_links.csv`.

 ** `data/review/product_links.csv`
 One row per review-approved link from a normalized retailer item to a catalog item.
@@ -263,7 +270,7 @@ One row per issue needing human review.
 | `resolution_notes`   | reviewer notes                                      |
 | `created_at`         | creation timestamp or date                          |
 | `updated_at`         | last update timestamp or date                       |
-** `data/catalog.csv`
+** `data/review/catalog.csv`
 One row per cross-retailer catalog product.
 | key                        | definition                             |
 |----------------------------+----------------------------------------|
@@ -288,7 +295,7 @@ Notes:
 - Do not encode packaging/count into `catalog_name` unless it is essential to product identity.
 - `catalog_name` should come from review-approved naming, not raw retailer strings.

-** `data/purchases.csv`
+** `data/analysis/purchases.csv`
 One row per purchased item (i.e., `is_item`==true from normalized layer), with
 catalog attributes denormalized in and discounts already applied.