data-model refactor and prep scope

2026-03-18 13:08:28 -04:00
parent 9122821db1
commit 10aad05808
3 changed files with 538 additions and 267 deletions
--- a/pm/data-model.org
+++ b/pm/data-model.org
@@ -1,12 +1,13 @@
-* grocery data model and file layout
+* Grocery data model and file layout

 This document defines the shared file layout and stable CSV schemas for the
-grocery pipeline. The goal is to keep retailer-specific ingest separate from
-cross-retailer product modeling so Giant-specific quirks do not become the
-system of record.
-
-** design rules
+grocery pipeline.
+Goals:
+- Ensure data gathering is separate from analysis
+- Enable multiple data gathering methods
+- One layer for review and analysis  

+ ** Design Rules
 - Raw retailer exports remain the source of truth.
 - Retailer parsing is isolated to retailer-specific files and ids.
 - Cross-retailer product layers begin only after retailer-specific enrichment.
@@ -14,296 +15,313 @@ system of record.
  existing columns should not be repurposed.
 - Unknown values should be left blank rather than guessed.

-** directory layout
-
-Use one top-level data root:
-
-#+begin_example
-data/
-  giant/
-    raw/
-      history.json
-      orders/
-        <order_id>.json
-    orders.csv
-    items_raw.csv
-    items_enriched.csv
-    products_observed.csv
-  costco/
-    raw/
-      ...
-    orders.csv
-    items_raw.csv
-    items_enriched.csv
-    products_observed.csv
-  shared/
-    products_canonical.csv
-    product_links.csv
-    review_queue.csv
-#+end_example
-
-** layer responsibilities
-
- `data/<retailer>/raw/`
-  Stores unmodified retailer payloads exactly as fetched.
- `data/<retailer>/orders.csv`
-  One row per retailer order or visit, flattened from raw order data.
- `data/<retailer>/items_raw.csv`
-  One row per retailer line item, preserving retailer-native values needed for
-  reruns and debugging.
- `data/<retailer>/items_enriched.csv`
-  Parsed retailer line items with normalized fields and derived guesses, still
-  retailer-specific.
- `data/<retailer>/products_observed.csv`
-  Distinct retailer-facing observed products aggregated from enriched items.
- `data/shared/products_canonical.csv`
-  Cross-retailer canonical product entities used for comparison.
- `data/shared/product_links.csv`
-  Links from retailer observed products to canonical products.
- `data/shared/review_queue.csv`
-  Human review queue for unresolved or low-confidence matching/parsing cases.
-
-** retailer-specific versus shared
-
-Retailer-specific:
-
+*** Retailer-specific data:
 - raw json payloads
 - retailer order ids
 - retailer line numbers
 - retailer category ids and names
 - retailer item names
 - retailer image urls
- parsed guesses derived from one retailer feed
 - observed products scoped to one retailer

-Shared:
-
+*** Review/Combined data:
 - canonical products
 - observed-to-canonical links
 - human review state for unresolved cases
 - comparison-ready normalized quantity basis fields

+// I don't like this terminology - what is "observed" doing for us?
+// output should be normalized_items, not observed
+// unless this is the way we're matching multiple upc's?
 Observed products are the boundary between retailer-specific parsing and
 cross-retailer canonicalization. Nothing upstream of `products_observed.csv`
 should require knowledge of another retailer.

-** schema: `data/<retailer>/orders.csv`
+* Pipeline
+Key: 
+- (1) input
+- [2] output

-One row per order or visit.
+Each step can be run alone if its dependents exist.

-| column | meaning |
-|-
-| `retailer` | retailer slug such as `giant` |
-| `order_id` | retailer order or visit id |
-| `order_date` | order date in `YYYY-MM-DD` when available |
-| `delivery_date` | fulfillment date in `YYYY-MM-DD` when available |
-| `service_type` | retailer service type such as `INSTORE` |
-| `order_total` | order total as provided by retailer |
-| `payment_method` | retailer payment label |
-| `total_item_count` | total line count or item count from retailer |
-| `total_savings` | total savings as provided by retailer |
-| `your_savings_total` | savings field from retailer when present |
-| `coupons_discounts_total` | coupon/discount total from retailer |
-| `store_name` | retailer store name |
-| `store_number` | retailer store number |
-| `store_address1` | street address |
-| `store_city` | city |
-| `store_state` | state or province |
-| `store_zipcode` | postal code |
-| `refund_order` | retailer refund flag |
-| `ebt_order` | retailer EBT flag |
-| `raw_history_path` | relative path to source history payload |
-| `raw_order_path` | relative path to source order payload |
+** 1. Collect
+Get raw receipt/visit and item data from a retailer.  Scraping is unique to a Retailer and method (e.g., Giant-Web and Giant-Scan).  Preserve complete raw data and preserve fidelity.  Avoid interpretation beyond basic data flattening.
+ - (1) Source access (Varies, eg header data, auth for API access)
+ - [1] collected visits from each retailer
+ - [2] collected items from each retailer
+ - [3] any other raw data that supports [1] and [2]; explicit source (eventual receipt scan?)
+   
+** 2. Normalize
+Parse and extract structured facts from retailer-specific raw data to create a standardized item format for that retailer.  Strictly dependent on Collect method and output.
+ - Extract quantity, size, pack, pricing, variant
+ - Add discount line items to product line items using upc/retail_item_id and concurrence
+ - Cleanup naming to facilitate later matching
+ - (1) collected items from each retailer
+ - (2) collected visits from each retailer
+ - [1] normalized items from each retailer

-Primary key:
+** 3. Review/Combine (Canonicalization)
+Decide whether two normalized retailer items are "the same product"; match items across retailers using algo/logic and human review.  Create catalog linked to normalized items.
+ - Grouping the same item from retailer
+ - Asking human to create a canonical/catalog item with:
+   - friendly/canonical_name: "bell pepper"; "milk"
+   - category: "produce"; "dairy"
+   - product_type: "pepper"; "milk"
+   - ? variant? "whole, "skim", "2pct"
+ - (1) normalized items from each retailer
+ - [1] review queue of items to be reviewed
+ - [2] catalog (lookup table) of confirmed retailer_item and canonical_name
+ - [3] canonical purchase list, pivot-ready
+   
+** Unresolved Issues
+1. need central script to orchestrate; metadata belongs there and nowhere else

- (`retailer`, `order_id`)
+** Symptoms
+- `LIME` and `LIME . / .` appearing in canonical_catalog:
+  - names must come from review-approved names, not raw strings

-** schema: `data/<retailer>/items_raw.csv`

+* Directory Layout
+Use one top-level data root:
+#+begin_example
+main.py
+collect_<retailer>_<method>.py
+normalize_<retailer>_<method>.py
+review.py
+data/
+  <retailer-method>/
+    raw/  # unmodified retailer payloads exactly as fetched
+      <order_id.json> 
+    collected_items.csv # one row per retailer line item w/ retailer-native values
+    collected_orders.csv # one row per receipt/visit, flattened from raw order data
+    normalized_items.csv # parsed retailer-specific line items with normalized fields
+  costco-web/ # sample
+    raw/
+      orders/
+        history.json
+        <order_id>.json
+    collected_items.csv
+    collected_orders.csv
+    normalized_items.csv
+  review/
+    review_queue.csv #  Human review queue for unresolved matching/parsing cases.
+    product_links.csv # Links from retailer-observed products to canonical products.
+  catalog.csv  # Cross-retailer canonical product entities used for comparison.
+  purchases.csv
+#+end_example
+
+* Schemas
+** `data/<retailer-method>/collected_items.csv`
 One row per retailer line item.
+| key                | definition                                 |
+|--------------------+--------------------------------------------|
+| `retailer` PK      | retailer slug                              |
+| `order_id` PK      | retailer order id                          |
+| `line_no`  PK      | stable line number within order export     |
+| `order_date`       | copied from order when available           |
+| `retailer_item_id` | retailer-native item id when available     |
+| `pod_id`           | retailer pod/item id                       |
+| `item_name`        | raw retailer item name                     |
+| `upc`              | retailer UPC or PLU value                  |
+| `category_id`      | retailer category id                       |
+| `category`         | retailer category description              |
+| `qty`              | retailer quantity field                    |
+| `unit`             | retailer unit code such as `EA` or `LB`    |
+| `unit_price`       | retailer unit price field                  |
+| `line_total`       | retailer extended price field              |
+| `picked_weight`    | retailer picked weight field               |
+| `mvp_savings`      | retailer savings field                     |
+| `reward_savings`   | retailer rewards savings field             |
+| `coupon_savings`   | retailer coupon savings field              |
+| `coupon_price`     | retailer coupon price field                |
+| `image_url`        | raw retailer image url when present        |
+| `raw_order_path`   | relative path to source order payload      |
+| `is_discount_line` | retailer adjustment or discount-line flag  |
+| `is_coupon_line`   | coupon-like line flag when distinguishable |

-| column           | meaning                                 |
-|------------------+-----------------------------------------|
-| `retailer`       | retailer slug                           |
-| `order_id`       | retailer order id                       |
-| `line_no`        | stable line number within order export  |
-| `order_date`     | copied from order when available        |
-| `retailer_item_id` | retailer-native item id when available |
-| `pod_id`         | retailer pod/item id                    |
-| `item_name`      | raw retailer item name                  |
-| `upc`            | retailer UPC or PLU value               |
-| `category_id`    | retailer category id                    |
-| `category`       | retailer category description           |
-| `qty`            | retailer quantity field                 |
-| `unit`           | retailer unit code such as `EA` or `LB` |
-| `unit_price`     | retailer unit price field               |
-| `line_total`     | retailer extended price field           |
-| `picked_weight`  | retailer picked weight field            |
-| `mvp_savings`    | retailer savings field                  |
-| `reward_savings` | retailer rewards savings field          |
-| `coupon_savings` | retailer coupon savings field           |
-| `coupon_price`   | retailer coupon price field             |
-| `image_url`      | raw retailer image url when present     |
-| `raw_order_path` | relative path to source order payload   |
-| `is_discount_line` | retailer adjustment or discount-line flag |
-| `is_coupon_line` | coupon-like line flag when distinguishable |
+** `data/<retailer-method>/collected_orders.csv`
+One row per order or visit.
+| key                       | definition                                      |
+|---------------------------+-------------------------------------------------|
+| `retailer` PK             | retailer slug such as `giant`                   |
+| `order_id` PK             | retailer order or visit id                      |
+| `order_date`              | order date in `YYYY-MM-DD` when available       |
+| `delivery_date`           | fulfillment date in `YYYY-MM-DD` when available |
+| `service_type`            | retailer service type such as `INSTORE`         |
+| `order_total`             | order total as provided by retailer             |
+| `payment_method`          | retailer payment label                          |
+| `total_item_count`        | total line count or item count from retailer    |
+| `total_savings`           | total savings as provided by retailer           |
+| `your_savings_total`      | savings field from retailer when present        |
+| `coupons_discounts_total` | coupon/discount total from retailer             |
+| `store_name`              | retailer store name                             |
+| `store_number`            | retailer store number                           |
+| `store_address1`          | street address                                  |
+| `store_city`              | city                                            |
+| `store_state`             | state or province                               |
+| `store_zipcode`           | postal code                                     |
+| `refund_order`            | retailer refund flag                            |
+| `ebt_order`               | retailer EBT flag                               |
+| `raw_history_path`        | relative path to source history payload         |
+| `raw_order_path`          | relative path to source order payload           |

-Primary key:
+** `data/<retailer-method>/normalized_items.csv`
+One row per retailer line item after deterministic parsing. Preserve raw
+fields from `collected_items.csv` and add parsed fields plus retailer-level
+identity needed before cross-retailer review.

- (`retailer`, `order_id`, `line_no`)
+| key                        | definition                                                       |
+|----------------------------+------------------------------------------------------------------|
+| `retailer` PK              | retailer slug                                                    |
+| `order_id` PK              | retailer order id                                                |
+| `line_no` PK               | line number within order                                         |
+| `normalized_row_id`        | stable row key, typically `<retailer>:<order_id>:<line_no>`      |
+| `normalized_item_id`       | stable retailer-level item identity after deterministic grouping |
+| `normalization_basis`      | basis used to assign `normalized_item_id`                        |
+| `retailer_item_id`         | retailer-native item id                                          |
+| `item_name`                | raw retailer item name                                           |
+| `item_name_norm`           | normalized retailer item name                                    |
+| `brand_guess`              | parsed brand guess                                               |
+| `variant`                  | parsed variant text                                              |
+| `size_value`               | parsed numeric size value                                        |
+| `size_unit`                | parsed size unit such as `oz`, `lb`, `fl_oz`                     |
+| `pack_qty`                 | parsed pack or count guess                                       |
+| `measure_type`             | `each`, `weight`, `volume`, `count`, or blank                    |
+| `normalized_quantity`      | numeric comparison basis derived during normalization            |
+| `normalized_quantity_unit` | basis unit such as `oz`, `lb`, `count`, or blank                 |
+| `is_store_brand`           | store-brand guess                                                |
+| `is_fee`                   | fee or non-product flag                                          |
+| `is_discount_line`         | discount or adjustment-line flag                                 |
+| `is_coupon_line`           | coupon-like line flag                                            |
+| `matched_discount_amount`  | matched discount value carried onto purchased row when supported |
+| `net_line_total`           | line total after matched discount when supported                 |
+| `price_per_each`           | derived per-each price when supported                            |
+| `price_per_each_basis`     | source basis for `price_per_each`                                |
+| `price_per_count`          | derived per-count price when supported                           |
+| `price_per_count_basis`    | source basis for `price_per_count`                               |
+| `price_per_lb`             | derived per-pound price when supported                           |
+| `price_per_lb_basis`       | source basis for `price_per_lb`                                  |
+| `price_per_oz`             | derived per-ounce price when supported                           |
+| `price_per_oz_basis`       | source basis for `price_per_oz`                                  |
+| `image_url`                | best available retailer image url                                |
+| `raw_order_path`           | relative path to source order payload                            |
+| `parse_version`            | parser version string for reruns                                 |
+| `parse_notes`              | optional non-fatal parser notes                                  |

-** schema: `data/<retailer>/items_enriched.csv`
-
-One row per retailer line item after deterministic parsing. Preserve the raw
-fields from `items_raw.csv` and add parsed fields.
-
-| column              | meaning                                                     |
-|---------------------+-------------------------------------------------------------|
-| `retailer`          | retailer slug                                               |
-| `order_id`          | retailer order id                                           |
-| `line_no`           | line number within order                                    |
-| `observed_item_key` | stable row key, typically `<retailer>:<order_id>:<line_no>` |
-| `retailer_item_id`  | retailer-native item id                                     |
-| `item_name`         | raw retailer item name                                      |
-| `item_name_norm`    | normalized item name                                        |
-| `brand_guess`       | parsed brand guess                                          |
-| `variant`           | parsed variant text                                         |
-| `size_value`        | parsed numeric size value                                   |
-| `size_unit`         | parsed size unit such as `oz`, `lb`, `fl_oz`                |
-| `pack_qty`          | parsed pack or count guess                                  |
-| `measure_type`      | `each`, `weight`, `volume`, `count`, or blank               |
-| `is_store_brand`    | store-brand guess                                           |
-| `is_fee`            | fee or non-product flag                                     |
-| `is_discount_line`  | discount or adjustment-line flag                            |
-| `is_coupon_line`    | coupon-like line flag                                       |
-| `price_per_each`    | derived per-each price when supported                       |
-| `price_per_lb`      | derived per-pound price when supported                      |
-| `price_per_oz`      | derived per-ounce price when supported                      |
-| `image_url`         | best available retailer image url                           |
-| `parse_version`     | parser version string for reruns                            |
-| `parse_notes`       | optional non-fatal parser notes                             |
-
-Primary key:
-
- (`retailer`, `order_id`, `line_no`)
-
-** schema: `data/<retailer>/products_observed.csv`
-
-One row per distinct retailer-facing observed product.
-
-| column                        | meaning                                                        |
-|-------------------------------+----------------------------------------------------------------|
-| `observed_product_id`         | stable observed product id                                     |
-| `retailer`                    | retailer slug                                                  |
-| `observed_key`                | deterministic grouping key used to create the observed product |
-| `representative_retailer_item_id` | best representative retailer-native item id               |
-| `representative_upc`          | best representative UPC/PLU                                    |
-| `representative_item_name`    | representative raw retailer name                               |
-| `representative_name_norm`    | representative normalized name                                 |
-| `representative_brand`        | representative brand guess                                     |
-| `representative_variant`      | representative variant                                         |
-| `representative_size_value`   | representative size value                                      |
-| `representative_size_unit`    | representative size unit                                       |
-| `representative_pack_qty`     | representative pack/count                                      |
-| `representative_measure_type` | representative measure type                                    |
-| `representative_image_url`    | representative image url                                       |
-| `is_store_brand`              | representative store-brand flag                                |
-| `is_fee`                      | representative fee flag                                        |
-| `is_discount_line`            | representative discount-line flag                              |
-| `is_coupon_line`              | representative coupon-line flag                                |
-| `first_seen_date`             | first order date seen                                          |
-| `last_seen_date`              | last order date seen                                           |
-| `times_seen`                  | number of enriched item rows grouped here                      |
-| `example_order_id`            | one example retailer order id                                  |
-| `example_item_name`           | one example raw item name                                      |
-| `distinct_retailer_item_ids_count` | count of distinct retailer-native item ids               |
-
-Primary key:
-
- (`observed_product_id`)
-
-** schema: `data/shared/products_canonical.csv`
-
-One row per cross-retailer canonical product.
-
-| column                     | meaning                                          |
-|----------------------------+--------------------------------------------------|
-| `canonical_product_id`     | stable canonical product id                      |
-| `canonical_name`           | canonical human-readable name                    |
-| `product_type`             | broad class such as `apple`, `milk`, `trash_bag` |
-| `brand`                    | canonical brand when applicable                  |
-| `variant`                  | canonical variant                                |
-| `size_value`               | normalized size value                            |
-| `size_unit`                | normalized size unit                             |
-| `pack_qty`                 | normalized pack/count                            |
-| `measure_type`             | normalized measure type                          |
-| `normalized_quantity`      | numeric comparison basis value                   |
-| `normalized_quantity_unit` | basis unit such as `oz`, `lb`, `count`           |
-| `notes`                    | optional human notes                             |
-| `created_at`               | creation timestamp or date                       |
-| `updated_at`               | last update timestamp or date                    |
-
-Primary key:
-
- (`canonical_product_id`)
-
-** schema: `data/shared/product_links.csv`
+Notes:
+- `normalized_item_id` replaces the need for a core `observed_products.csv` layer.
+- `normalization_basis` should be explicit values like `exact_upc`, `retailer_item_id`, `name_size_pack`, or `manual_retailer_alias`.
+- Cross-retailer identity is still handled later in review/combine via `catalog.csv` and `product_links.csv`.

+** `data/review/product_links.csv`
 One row per observed-to-canonical relationship.
+1 (catalog_item) to many (normalized_items)

-| column | meaning |
-|-
-| `observed_product_id` | retailer observed product id |
-| `canonical_product_id` | linked canonical product id |
-| `link_method` | `manual`, `exact_upc`, `exact_name`, etc. |
-| `link_confidence` | optional confidence label |
-| `review_status` | `pending`, `approved`, `rejected`, or blank |
-| `reviewed_by` | reviewer id or initials |
-| `reviewed_at` | review timestamp or date |
-| `link_notes` | optional notes |
-
-Primary key:
-
- (`observed_product_id`, `canonical_product_id`)
-
-** schema: `data/shared/review_queue.csv`
+| key               | definition                                  |
+|-------------------+---------------------------------------------|
+| `observed_id` PK  | retailer observed product id                |
+| `catalog_id` PK   | linked canonical product id                 |
+| `link_method`     | `manual`, `exact_upc`, `exact_name`, etc.   |
+| `link_confidence` | optional confidence label                   |
+| `review_status`   | `pending`, `approved`, `rejected`, or blank |
+| `reviewed_by`     | reviewer id or initials                     |
+| `reviewed_at`     | review timestamp or date                    |
+| `link_notes`      | optional notes                              |

+** `data/review/review_queue.csv`
 One row per issue needing human review.

-| column | meaning |
-|-
-| `review_id` | stable review row id |
-| `queue_type` | `observed_product`, `link_candidate`, `parse_issue` |
-| `retailer` | retailer slug when applicable |
-| `observed_product_id` | observed product id when applicable |
-| `canonical_product_id` | candidate canonical id when applicable |
-| `reason_code` | machine-readable review reason |
-| `priority` | optional priority label |
-| `raw_item_names` | compact list of example raw names |
-| `normalized_names` | compact list of example normalized names |
-| `upc` | example UPC/PLU |
-| `image_url` | example image url |
-| `example_prices` | compact list of example prices |
-| `seen_count` | count of related rows |
-| `status` | `pending`, `approved`, `rejected`, `deferred` |
-| `resolution_notes` | reviewer notes |
-| `created_at` | creation timestamp or date |
-| `updated_at` | last update timestamp or date |
+| key                   | definition                                          |
+|-----------------------+-----------------------------------------------------|
+| `review_id` PK        | stable review row id                                |
+| `queue_type`          | `observed_product`, `link_candidate`, `parse_issue` |
+| `retailer`            | retailer slug when applicable                       |
+| `observed_product_id` | observed product id when applicable                 |
+| `catalod_id`          | candidate canonical id when applicable              |
+| `reason_code`         | machine-readable review reason                      |
+| `priority`            | optional priority label                             |
+| `raw_item_names`      | compact list of example raw names                   |
+| `normalized_names`    | compact list of example normalized names            |
+| `upc`                 | example UPC/PLU                                     |
+| `image_url`           | example image url                                   |
+| `example_prices`      | compact list of example prices                      |
+| `seen_count`          | count of related rows                               |
+| `status`              | `pending`, `approved`, `rejected`, `deferred`       |
+| `resolution_notes`    | reviewer notes                                      |
+| `created_at`          | creation timestamp or date                          |
+| `updated_at`          | last update timestamp or date                       |
+** `data/catalog.csv`
+One row per cross-retailer canonical product.
+| key                        | definition                             |
+|----------------------------+----------------------------------------|
+| `catalog_id` PK            | stable canonical product id            |
+| `catalog_name`             | canonical human-readable name          |
+| `product_type`             | generic product eg `apple`, `milk`     |
+| `category`                 | broad section eg `produce`, `dairy`    |
+| `brand`                    | canonical brand when applicable        |
+| `variant`                  | canonical variant                      |
+| `size_value`               | normalized size value                  |
+| `size_unit`                | normalized size unit                   |
+| `pack_qty`                 | normalized pack/count                  |
+| `measure_type`             | normalized measure type                |
+| `normalized_quantity`      | numeric comparison basis value         |
+| `normalized_quantity_unit` | basis unit such as `oz`, `lb`, `count` |
+| `notes`                    | optional human notes                   |
+| `created_at`               | creation timestamp or date             |
+| `updated_at`               | last update timestamp or date          |

-Primary key:
+** `data/purchases.csv`
+One row per purchased item (i.e., `row_type=item` from normalized layer), with
+catalog attributes denormalized in and discounts already applied.

- (`review_id`)
+| key                        | definition                                                     |
+|----------------------------+----------------------------------------------------------------|
+| `purchase_date`            | date of purchase (from order)                                  |
+| `retailer`                 | retailer slug                                                  |
+| `order_id`                 | retailer order id                                              |
+| `line_no`                  | line number within order                                       |
+| `normalized_row_id`        | `<retailer>:<order_id>:<line_no>`                              |
+| `normalized_item_id`       | retailer-level normalized item identity                        |
+| `catalog_id`               | linked canonical product id                                    |
+| `catalog_name`             | canonical product name for analysis                            |
+| `catalog_product_type`     | broader product family (e.g., `egg`, `milk`)                   |
+| `catalog_category`         | category such as `produce`, `dairy`                            |
+| `catalog_brand`            | canonical brand when applicable                                |
+| `catalog_variant`          | canonical variant when applicable                              |
+| `raw_item_name`            | original retailer item name                                    |
+| `normalized_item_name`     | cleaned/normalized retailer item name                          |
+| `retailer_item_id`         | retailer-native item id                                        |
+| `upc`                      | UPC/PLU when available                                         |
+| `qty`                      | retailer quantity field                                        |
+| `unit`                     | retailer unit (e.g., `EA`, `LB`)                               |
+| `pack_qty`                 | parsed pack/count                                              |
+| `size_value`               | parsed size value                                              |
+| `size_unit`                | parsed size unit                                               |
+| `measure_type`             | `each`, `weight`, `volume`, `count`                            |
+| `normalized_quantity`      | normalized comparison quantity                                 |
+| `normalized_quantity_unit` | unit for normalized quantity                                   |
+| `unit_price`               | retailer unit price                                            |
+| `line_total`               | original retailer extended price (pre-discount)                |
+| `matched_discount_amount`  | discount amount matched from discount lines                    |
+| `net_line_total`           | effective price after discount (`line_total` + discounts)      |
+| `store_name`               | retailer store name                                            |
+| `store_city`               | store city                                                     |
+| `store_state`              | store state                                                    |
+| `price_per_each`           | derived per-each price                                         |
+| `price_per_each_basis`     | source basis for per-each calc                                 |
+| `price_per_count`          | derived per-count price                                        |
+| `price_per_count_basis`    | source basis for per-count calc                                |
+| `price_per_lb`             | derived per-pound price                                        |
+| `price_per_lb_basis`       | source basis for per-pound calc                                |
+| `price_per_oz`             | derived per-ounce price                                        |
+| `price_per_oz_basis`       | source basis for per-ounce calc                                |
+| `is_fee`                   | true if row represents non-product fee                         |
+| `raw_order_path`           | relative path to original order payload                        |

-** current giant mapping
+Notes:
+- Only rows with `row_type=item` from normalization should appear here.
+- `line_total` preserves retailer truth; `net_line_total` is what you actually paid.
+- catalog fields are denormalized in to make pivoting trivial.
+- no discount/coupon rows exist here; their effects are carried via `matched_discount_amount`.

-Current scraper outputs map to the new layout as follows:
-
- `giant_output/raw/history.json` -> `data/giant/raw/history.json`
- `giant_output/raw/<order_id>.json` -> `data/giant/raw/orders/<order_id>.json`
- `giant_output/orders.csv` -> `data/giant/orders.csv`
- `giant_output/items.csv` -> `data/giant/items_raw.csv`
-
-Current Giant raw order payloads already expose fields needed for future
-enrichment, including `image`, `itemName`, `primUpcCd`, `lbEachCd`,
-`unitPrice`, `groceryAmount`, and `totalPickedWeight`.
+* /