diff --git a/pm/data-model.org b/pm/data-model.org index 5b1966c..6a25468 100644 --- a/pm/data-model.org +++ b/pm/data-model.org @@ -129,6 +129,7 @@ One row per retailer line item. | `order_id` | retailer order id | | `line_no` | stable line number within order export | | `order_date` | copied from order when available | +| `retailer_item_id` | retailer-native item id when available | | `pod_id` | retailer pod/item id | | `item_name` | raw retailer item name | | `upc` | retailer UPC or PLU value | @@ -145,6 +146,8 @@ One row per retailer line item. | `coupon_price` | retailer coupon price field | | `image_url` | raw retailer image url when present | | `raw_order_path` | relative path to source order payload | +| `is_discount_line` | retailer adjustment or discount-line flag | +| `is_coupon_line` | coupon-like line flag when distinguishable | Primary key: @@ -161,6 +164,7 @@ fields from `items_raw.csv` and add parsed fields. | `order_id` | retailer order id | | `line_no` | line number within order | | `observed_item_key` | stable row key, typically `::` | +| `retailer_item_id` | retailer-native item id | | `item_name` | raw retailer item name | | `item_name_norm` | normalized item name | | `brand_guess` | parsed brand guess | @@ -171,6 +175,8 @@ fields from `items_raw.csv` and add parsed fields. | `measure_type` | `each`, `weight`, `volume`, `count`, or blank | | `is_store_brand` | store-brand guess | | `is_fee` | fee or non-product flag | +| `is_discount_line` | discount or adjustment-line flag | +| `is_coupon_line` | coupon-like line flag | | `price_per_each` | derived per-each price when supported | | `price_per_lb` | derived per-pound price when supported | | `price_per_oz` | derived per-ounce price when supported | @@ -191,6 +197,7 @@ One row per distinct retailer-facing observed product. | `observed_product_id` | stable observed product id | | `retailer` | retailer slug | | `observed_key` | deterministic grouping key used to create the observed product | +| `representative_retailer_item_id` | best representative retailer-native item id | | `representative_upc` | best representative UPC/PLU | | `representative_item_name` | representative raw retailer name | | `representative_name_norm` | representative normalized name | @@ -203,11 +210,14 @@ One row per distinct retailer-facing observed product. | `representative_image_url` | representative image url | | `is_store_brand` | representative store-brand flag | | `is_fee` | representative fee flag | +| `is_discount_line` | representative discount-line flag | +| `is_coupon_line` | representative coupon-line flag | | `first_seen_date` | first order date seen | | `last_seen_date` | last order date seen | | `times_seen` | number of enriched item rows grouped here | | `example_order_id` | one example retailer order id | | `example_item_name` | one example raw item name | +| `distinct_retailer_item_ids_count` | count of distinct retailer-native item ids | Primary key: @@ -297,4 +307,3 @@ Current scraper outputs map to the new layout as follows: Current Giant raw order payloads already expose fields needed for future enrichment, including `image`, `itemName`, `primUpcCd`, `lbEachCd`, `unitPrice`, `groceryAmount`, and `totalPickedWeight`. - diff --git a/pm/tasks.org b/pm/tasks.org index 7e90d32..13cfce7 100644 --- a/pm/tasks.org +++ b/pm/tasks.org @@ -143,7 +143,7 @@ - tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python build_canonical_layer.py`; verified auto-linked `giant_output/products_canonical.csv` and `giant_output/product_links.csv` - date: 2026-03-16 -* [ ] t1.8: support costco raw ingest path (2-5 commits) +* [X] t1.8: support costco raw ingest path (2-5 commits) ** acceptance criteria - add a costco-specific raw ingest/export path @@ -158,11 +158,11 @@ - bearer/auth values should come from local env, not source ** evidence -- commit: -- tests: -- date: +- commit: `da00288` on branch `cx` +- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python scrape_costco.py --help`; verified `costco_output/raw/*.json`, `costco_output/orders.csv`, and `costco_output/items.csv` from the local sample payload +- date: 2026-03-16 -* [ ] t1.8.1: support costco parser/enricher path (2-4 commits) +* [X] t1.8.1: support costco parser/enricher path (2-4 commits) ** acceptance criteria - add a costco-specific enrich step producing `costco_output/items_enriched.csv` @@ -179,10 +179,10 @@ - expect weaker identifiers than Giant ** evidence -- commit: -- tests: -- date: -* [ ] t1.8.2: validate cross-retailer observed/canonical flow (1-3 commits) +- commit: `da00288` on branch `cx` +- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python enrich_costco.py`; verified `costco_output/items_enriched.csv` +- date: 2026-03-16 +* [X] t1.8.2: validate cross-retailer observed/canonical flow (1-3 commits) ** acceptance criteria - feed Giant and Costco enriched rows through the same observed/canonical pipeline @@ -197,10 +197,10 @@ - apples, eggs, bananas, or flour are better than weird prepared foods ** evidence -- commit: -- tests: -- date: -* [ ] t1.8.3: extend shared schema for retailer-native ids and adjustment lines (1-2 commits) +- commit: `da00288` on branch `cx` +- tests: `./venv/bin/python -m unittest discover -s tests`; `./venv/bin/python validate_cross_retailer_flow.py`; proof example: Giant `FRESH BANANA` and Costco `BANANAS 3 LB / 1.36 KG` share one canonical in `combined_output/proof_examples.csv` +- date: 2026-03-16 +* [X] t1.8.3: extend shared schema for retailer-native ids and adjustment lines (1-2 commits) ** acceptance criteria - add shared fields needed for non-upc retailers, including: @@ -215,9 +215,9 @@ - do this once instead of sprinkling exceptions everywhere ** evidence -- commit: -- tests: -- date: +- commit: `9497565` on branch `cx` +- tests: `./venv/bin/python -m unittest discover -s tests`; verified shared enriched fields in `giant_output/items_enriched.csv` and `costco_output/items_enriched.csv` +- date: 2026-03-16 * [ ] t1.9: compute normalized comparison metrics (2-4 commits) ** acceptance criteria