1.18 cleanup and review
This commit is contained in:
66
pm/notes.org
66
pm/notes.org
@@ -587,4 +587,68 @@ instead of
|
|||||||
[5] yellow onion, onion, produce (0 items, 0 rows)
|
[5] yellow onion, onion, produce (0 items, 0 rows)
|
||||||
selection:
|
selection:
|
||||||
|
|
||||||
*
|
* data cleanup [2026-03-23 Mon]
|
||||||
|
ok we're getting closer. still see some issues
|
||||||
|
1. reorder purchases columns for display: catalog_name, product_type, category (makes data/troubleshooting way easier)
|
||||||
|
2. shouldn't net_line_price should never be empty? to allow cumulative cost comparison/analysis (we can see normalized price per X via effective_price but shouldnt this be weighted against how much we bought? eg if we bought 5lb flour at $0.970/lb this is weighted as 1-to-1 with a 25lb purchase as 0.670/lb
|
||||||
|
3. some items missing entire categorizations? probably a result of me trying to do data cleanup. i found the orphaned values in teh product_links table and removed them, but re-running review_products.py did not catch this...
|
||||||
|
shouldn't review_products run a comparison between each vendor's normalized_items and compare to the existing review_queu?
|
||||||
|
RSET POTATO US 1
|
||||||
|
GREEK YOGURT DOM55
|
||||||
|
FDLY CHY VAN IC CRM
|
||||||
|
DUNKIN DONUT CANISTER ORIG BLND P=260
|
||||||
|
ICE CUBES
|
||||||
|
BLACK BEANS
|
||||||
|
KETCHUP SQUEEZE BTL
|
||||||
|
YELLOW_GOLD POTATO US 1
|
||||||
|
YELLOW_GOLD POTATO US 1
|
||||||
|
PINTO BEANS
|
||||||
|
4. cleanup deprecated .py files
|
||||||
|
5. Goals:
|
||||||
|
1. When have I purchased this item, what did I pay, and how has the price changed over time?
|
||||||
|
- we're close, but missing units - eg AP flour shows a value that looks like price/lb but you just see $0.765
|
||||||
|
- doesnt seem like we've captured everything but that's just a gut feeling
|
||||||
|
2. Visit breakdown as well as catalog/product/category? this certainly belongs in purchases.csv.
|
||||||
|
3. Consider dash/plotly for better-than-excel tracking, since we're really only looking at a couple of graphs and filtering within certain values? (obv keep purchases as a user-friendly output)
|
||||||
|
** 1. Cleanup purchases column order
|
||||||
|
purchase_date
|
||||||
|
retailer
|
||||||
|
catalog_name
|
||||||
|
product_type
|
||||||
|
category
|
||||||
|
net_line_total
|
||||||
|
normalized_quantity
|
||||||
|
effective_price
|
||||||
|
effective_price_unit (new)
|
||||||
|
order_id
|
||||||
|
line_no
|
||||||
|
raw_item_name
|
||||||
|
normalized_item_name
|
||||||
|
catalog_id
|
||||||
|
normalized_item_id
|
||||||
|
** 2. Populate and use purchases.net_line_total
|
||||||
|
net_line_total = line_total+matched_discount_amoun
|
||||||
|
effective_price = net_line_total / normalized_quantity
|
||||||
|
weighted cost analysis uses net_line_total, not just avg effective_price
|
||||||
|
** 3. Improve review robustness, enable norm_item re review
|
||||||
|
1. should regenerate candidates from:
|
||||||
|
- normalized items with no valid catalog_id
|
||||||
|
- normalized items whose linked catalog_id no longer exists
|
||||||
|
- normalized items whose linked catalog row exists but missing required fields if you want completeness review
|
||||||
|
2. review_products.py should compare:
|
||||||
|
- current normalized universe
|
||||||
|
- current product_links
|
||||||
|
- current catalog
|
||||||
|
- current review_queue
|
||||||
|
** 4. Remove deprecated.py
|
||||||
|
** 5. Improve Charts
|
||||||
|
1. Histogram: add effective_price_unit to purchases.py
|
||||||
|
1. Visits: plot by order_id enable display of:
|
||||||
|
1. spend by visit
|
||||||
|
2. items per visit
|
||||||
|
3. category spend by visit
|
||||||
|
4. retailer/store breakdown
|
||||||
|
|
||||||
|
* /
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -962,7 +962,7 @@ Costco 25# FLOUR not parsed into normalized weight - meaure_type says each
|
|||||||
** notes
|
** notes
|
||||||
- Costco `25#` weight text was falling through to `each` because the hash-size parser missed sizes followed by whitespace.
|
- Costco `25#` weight text was falling through to `each` because the hash-size parser missed sizes followed by whitespace.
|
||||||
- This fix is intentionally narrow: explicit `#`-weight parsing now feeds the existing quantity and effective-price flow without changing `normalized_item_id` behavior.
|
- This fix is intentionally narrow: explicit `#`-weight parsing now feeds the existing quantity and effective-price flow without changing `normalized_item_id` behavior.
|
||||||
|
|
||||||
* [X] t1.18.4: clean purchases output and finalize effective price fields (2-4 commits)
|
* [X] t1.18.4: clean purchases output and finalize effective price fields (2-4 commits)
|
||||||
make `purchases.csv` easier to inspect and ensure price fields support weighted cost analysis
|
make `purchases.csv` easier to inspect and ensure price fields support weighted cost analysis
|
||||||
|
|
||||||
@@ -995,7 +995,7 @@ make `purchases.csv` easier to inspect and ensure price fields support weighted
|
|||||||
** notes
|
** notes
|
||||||
- `purchases.csv` now carries a filled `net_line_total` for every row, preserving existing values from normalization and deriving the rest from `line_total` plus matched discounts.
|
- `purchases.csv` now carries a filled `net_line_total` for every row, preserving existing values from normalization and deriving the rest from `line_total` plus matched discounts.
|
||||||
- `effective_price_unit` now mirrors the normalized quantity basis, so downstream analysis can tell whether an `effective_price` is per `lb`, `oz`, `count`, or `each`.
|
- `effective_price_unit` now mirrors the normalized quantity basis, so downstream analysis can tell whether an `effective_price` is per `lb`, `oz`, `count`, or `each`.
|
||||||
|
|
||||||
* [X] t1.19: make review_products.py robust to orphaned and incomplete catalog links (2-4 commits)
|
* [X] t1.19: make review_products.py robust to orphaned and incomplete catalog links (2-4 commits)
|
||||||
refresh review state from the current normalized universe so missing or broken links re-enter review instead of silently disappearing
|
refresh review state from the current normalized universe so missing or broken links re-enter review instead of silently disappearing
|
||||||
|
|
||||||
@@ -1048,7 +1048,6 @@ ensure purchases retains enough visit/order context to support spend-by-visit an
|
|||||||
- datetime:
|
- datetime:
|
||||||
|
|
||||||
** notes
|
** notes
|
||||||
|
|
||||||
|
|
||||||
* [ ] t1.21: add lightweight charting/analysis surface on top of purchases.csv (2-4 commits)
|
* [ ] t1.21: add lightweight charting/analysis surface on top of purchases.csv (2-4 commits)
|
||||||
build a minimal analysis layer for common price and visit charts without changing the csv pipeline
|
build a minimal analysis layer for common price and visit charts without changing the csv pipeline
|
||||||
|
|||||||
Reference in New Issue
Block a user