updated scope to prep for costco scraper
This commit is contained in:
File diff suppressed because one or more lines are too long
95
pm/tasks.org
95
pm/tasks.org
@@ -147,35 +147,96 @@
|
||||
|
||||
** acceptance criteria
|
||||
- add a costco-specific raw ingest/export path
|
||||
- output costco line items into the same shared raw/enriched schema family
|
||||
- confirm at least one product class can exist as:
|
||||
- giant observed product
|
||||
- costco observed product
|
||||
- one shared canonical product
|
||||
- fetch costco receipt summary and receipt detail payloads from graphql endpoint
|
||||
- persist raw json under `costco_output/raw/orders.csv` and `./items.csv`, same format as giant
|
||||
- costco-native identifiers such as `transactionBarcode` as order id and `itemNumber` as retailer item id
|
||||
- preserve discount/coupon rows rather than dropping
|
||||
|
||||
** notes
|
||||
- this is the proof that the architecture generalizes
|
||||
- don’t chase perfection before the second retailer lands
|
||||
- focus on raw costco acquisistion and flattening
|
||||
- do not force costco identifiers into `upc`
|
||||
- bearer/auth values should come from local env, not source
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
|
||||
* [ ] t1.9: compute normalized comparison metrics (2-3 commits)
|
||||
* [ ] t1.8.1: support costco parser/enricher path (2-4 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- derive normalized comparison fields where possible:
|
||||
- price per lb
|
||||
- price per oz
|
||||
- price per each
|
||||
- price per count
|
||||
- metrics are attached at canonical or linked-observed level as appropriate
|
||||
- emit obvious nulls when basis is unknown rather than inventing values
|
||||
- add a costco-specific enrich step producing `costco_output/items_enriched.csv`
|
||||
- output rows into the same shared enriched schema family as Giant
|
||||
- support costco-specific parsing for:
|
||||
- `itemDescription01` + `itemDescription02`
|
||||
- `itemNumber` as `retailer_item_id`
|
||||
- discount lines / negative rows
|
||||
- common size patterns such as `25#`, `48 OZ`, `2/24 OZ`, `6-PACK`
|
||||
- preserve obvious unknowns as blank rather than guessed values
|
||||
|
||||
** notes
|
||||
- this is where “gala apples 5 lb bag vs other gala apples” becomes possible
|
||||
- units discipline matters a lot here
|
||||
- this is the real schema compatibility proof, not raw ingest alone
|
||||
- expect weaker identifiers than Giant
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
* [ ] t1.8.2: validate cross-retailer observed/canonical flow (1-3 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- feed Giant and Costco enriched rows through the same observed/canonical pipeline
|
||||
- confirm at least one product class can exist as:
|
||||
- Giant observed product
|
||||
- Costco observed product
|
||||
- one shared canonical product
|
||||
- document the exact example used for proof
|
||||
|
||||
** notes
|
||||
- keep this to one or two well-behaved product classes first
|
||||
- apples, eggs, bananas, or flour are better than weird prepared foods
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
* [ ] t1.8.3: extend shared schema for retailer-native ids and adjustment lines (1-2 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- add shared fields needed for non-upc retailers, including:
|
||||
- `retailer_item_id`
|
||||
- `is_discount_line`
|
||||
- `is_coupon_line` or equivalent if needed
|
||||
- keep `upc` nullable across the pipeline
|
||||
- update downstream builders/tests to accept retailers with blank `upc`
|
||||
|
||||
** notes
|
||||
- this prevents costco from becoming a schema hack
|
||||
- do this once instead of sprinkling exceptions everywhere
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
- tests:
|
||||
- date:
|
||||
* [ ] t1.9: compute normalized comparison metrics (2-4 commits)
|
||||
|
||||
** acceptance criteria
|
||||
- derive normalized comparison fields where possible on enriched or observed product rows:
|
||||
- `price_per_lb`
|
||||
- `price_per_oz`
|
||||
- `price_per_each`
|
||||
- `price_per_count`
|
||||
- preserve the source basis used to derive each metric, e.g.:
|
||||
- parsed size/unit
|
||||
- receipt weight
|
||||
- explicit count/pack
|
||||
- emit nulls when basis is unknown, conflicting, or ambiguous
|
||||
- document at least one Giant vs Costco comparison example using the normalized metrics
|
||||
|
||||
** notes
|
||||
- compute metrics as close to the raw observation as possible
|
||||
- canonical layer can aggregate later, but should not invent missing unit economics
|
||||
- unit discipline matters more than coverage
|
||||
|
||||
** evidence
|
||||
- commit:
|
||||
|
||||
Reference in New Issue
Block a user