Files
scrape-giant/pm/notes.org
2026-03-24 08:27:41 -04:00

655 lines
53 KiB
Org Mode
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
* python setup
venv install playwright, pandas
playwright install
1. scrape - raw giant json
2. enrich -
cols:
item_name_norm
brand_guess
size_value
size_unit
pack_qty
variant
is_store_brand
is_fee
measure_type
price_per_lb
price_per_oz
price_per_each
image_url
normalize abbreviationsta
extract size like 12z, 10ct, 5lb
detect fees like bag charges
infer whether something is sold by each vs weight
carry forward image url
3. build observed-product atble from enriched items
* git issues
- dont try to git push from win emacs viewing wsl, it will be screwy (windows identity vs wsl)
** ssh / access to gitea
ssh://git@192.168.1.207:2020/ben/scrape-giant.git
https://git.hgsky.me/ben/scrape-giant.git
add Port to config:
Host gitea
HostName 192.168.1.207
User git
Port 2020
IdentityFile ~/.ssh/gitea
IdentitiesOnly yes
then
git remote set-url gitea git@gitea:ben/scrape-giant.git
on local network: use ssh to 192.168.1.207:2020
from elsewhere/public: use https to git.hgsky.me/... unless you later expose ssh properly
** stash
z z to stash local work only
take care not to add ignored files which will add the venv and `__pycache__`
z p to pop the stash back
** creating remote branches
P p, magit will suggest upstream (gitea), select and Enter and it will be created
** cherry-picking
b b : switch to desired branch (review)
l B : open reflog for local branches
(my changes were committed to local cx but not pushed to gitea/cx)
put point on the commit you want; did this in sequence
A A : cherry pick commit to current branch
minibuffer will show the commit and all branches, leave it on that commit
the final commit was not shown by hash, just the branch cx
since (local) cx was caught up with that branch
** reverting a branch
b l : switch to local branch (cx)
l l : open local reflog
put point on the commit; highlighted remote gitea/cx
X : reset branch; prompts you, selected cx
** merge branch
b b : switch to branch to be merged into (cx)
m m : pick branch to merge into current branch
* giant requests
** item:
get:
/api/v6.0/user/369513017/order/history/detail/69a2e44a16be1142e74ad3cc
headers:
request:
GET /api/v6.0/user/369513017/order/history/detail/69a2e44a16be1142e74ad3cc?isInStore=true HTTP/2
Host: giantfood.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br, zstd
DNT: 1
Sec-GPC: 1
Connection: keep-alive
Referer: https://giantfood.com/account/history/invoice/in-store
Cookie: datadome=rDtvd3J2hO5AeghJMSFRRxGc6ifKCQYgMLcqPNr9rWiz2rdcXb032AY6GIZn8tUmYB96BKKbzh3_jSjEzYWLj8hDjl3oGYYAiu4jwdaxpf3vh2v4f7KH7kbqgsMWpkjt; cf_clearance=WEPyQokx9f0qoyS4Svsw4EkZ1TYOxjOwcUHspT3.rXw-1773348940-1.2.1.1-fPvERGxBlFUaBW83sUppbUWpwvFG7mZivag5vBvZb3kxUQv2WSVIV1tON0HV2n8bkVY0U8_BBl62a00Np.oJylYQcGME540gZlYEoL.gMs4WynLqApFe5BOXAEwOm01_6h6b62H90bl4ypRehVb_TXEi4qHaPLVSZhjZK_h.fv6RBqjgYch2j_8XnHe5HXvLziVjl1k2aJskozqy04KOyeHyc3OyIPTZd5On_KAzFIM; dvrctk=MnjKJVShVraEtbrBkkxWxLaZrXnIGNQlwB7QtZVPFeA=; __cflb=0H28vXMLFyydRmDMNgcPHijM6auXkCspCkuh58tVuJ3; __cf_bm=C6QbqiEvbbwdrYBpoJOkcWcedf60vcOfPfTPPbZzKbM-1773348202-1.0.1.1-cSHoYwi8ZjIHTdBItXQP_iXJdRJS6FYjFsGdl1eGHvS5pgfbcT4Lg19P6UStX.bZz1u0OXiS5ykdipPBtwP6OvZr68k4XSmjYpir05jNLhw; _dd_s=rum=0&expire=1773349846445; ppdtk=Uog72CR22mD85C7U4iZHlgOQeRmvHEYp0OdQc+0lEes1c5/LeqGT+ZUlXpSC6FpW; cartId=3820547
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
Priority: u=0
TE: trailers
response:
HTTP/2 200
date: Thu, 12 Mar 2026 20:55:47 GMT
content-type: application/json
server: cloudflare
cf-ray: 9db5b3a5d84aff28-IAD
cf-cache-status: DYNAMIC
content-encoding: gzip
set-cookie: datadome=MXMri0hss6PlQ0_oS7gG2iMdOKnNkbDmGvOxelgN~nCcupgkJQOqjcjcgdprIaI7hSlt_w8E9Ri_RAzPFrGqtUfqAJ_szB_aNZ2FdC26qmI3870Nn4~T0vtx8Gj3dEZR; Max-Age=31536000; Domain=.giantfood.com; Path=/; Secure; SameSite=Lax
strict-transport-security: max-age=31536000; includeSubDomains
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers, accept-encoding
accept-ch: Sec-CH-UA,Sec-CH-UA-Mobile,Sec-CH-UA-Platform,Sec-CH-UA-Arch,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-Device-Memory
x-datadome: protected
request-context: appId=cid-v1:75750625-0c81-4f08-9f5d-ce4f73198e54
X-Firefox-Spdy: h2
** history:
GET
https://giantfood.com/api/v6.0/user/369513017/order/history?filter=instore&loyaltyNumber=440155630880
headers:
request:
GET /api/v6.0/user/369513017/order/history?filter=instore&loyaltyNumber=440155630880 HTTP/2
Host: giantfood.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br, zstd
DNT: 1
Sec-GPC: 1
Connection: keep-alive
Referer: https://giantfood.com/account/history/invoice/in-store
Cookie: datadome=OH2XjtCoI6XjE3Qsz_b0F1YULKLatAC0Ea~VMeDGBP0N9Z~CeI3RqEbvkGmNW_VCOU~vRb6p0kqibvF2tLbWnzyAGIdO7jsC41KiYbp7USpJDnefZhIg0e1ypAugvDSw; cf_clearance=WEPyQokx9f0qoyS4Svsw4EkZ1TYOxjOwcUHspT3.rXw-1773348940-1.2.1.1-fPvERGxBlFUaBW83sUppbUWpwvFG7mZivag5vBvZb3kxUQv2WSVIV1tON0HV2n8bkVY0U8_BBl62a00Np.oJylYQcGME540gZlYEoL.gMs4WynLqApFe5BOXAEwOm01_6h6b62H90bl4ypRehVb_TXEi4qHaPLVSZhjZK_h.fv6RBqjgYch2j_8XnHe5HXvLziVjl1k2aJskozqy04KOyeHyc3OyIPTZd5On_KAzFIM; dvrctk=MnjKJVShVraEtbrBkkxWxLaZrXnIGNQlwB7QtZVPFeA=; __cflb=0H28vXMLFyydRmDMNgcPHijM6auXkCspCkuh58tVuJ3; __cf_bm=C6QbqiEvbbwdrYBpoJOkcWcedf60vcOfPfTPPbZzKbM-1773348202-1.0.1.1-cSHoYwi8ZjIHTdBItXQP_iXJdRJS6FYjFsGdl1eGHvS5pgfbcT4Lg19P6UStX.bZz1u0OXiS5ykdipPBtwP6OvZr68k4XSmjYpir05jNLhw; _dd_s=rum=0&expire=1773349842848; ppdtk=Uog72CR22mD85C7U4iZHlgOQeRmvHEYp0OdQc+0lEes1c5/LeqGT+ZUlXpSC6FpW; cartId=3820547
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
Priority: u=0
TE: trailers
response:
HTTP/2 200
date: Thu, 12 Mar 2026 20:55:43 GMT
content-type: application/json
server: cloudflare
cf-ray: 9db5b38f7eebff28-IAD
cf-cache-status: DYNAMIC
content-encoding: gzip
set-cookie: datadome=rDtvd3J2hO5AeghJMSFRRxGc6ifKCQYgMLcqPNr9rWiz2rdcXb032AY6GIZn8tUmYB96BKKbzh3_jSjEzYWLj8hDjl3oGYYAiu4jwdaxpf3vh2v4f7KH7kbqgsMWpkjt; Max-Age=31536000; Domain=.giantfood.com; Path=/; Secure; SameSite=Lax
strict-transport-security: max-age=31536000; includeSubDomains
vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers, accept-encoding
accept-ch: Sec-CH-UA,Sec-CH-UA-Mobile,Sec-CH-UA-Platform,Sec-CH-UA-Arch,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-Device-Memory
x-datadome: protected
request-context: appId=cid-v1:75750625-0c81-4f08-9f5d-ce4f73198e54
X-Firefox-Spdy: h2
* costco requests
- localstorage idToken has the auth token, but needs "Bearer " prepended
- localstorage clientID has the COSTCO_X_WCS_CLIENTID
- I don't see the client_identifier uuid anywhere.
we will pull from .env first (may have to hardcode)
then overwrite with session data (token)
hopefully this doesnt change.
** warehouse
*** POST
https://ecom-api.costco.com/ebusiness/order/v1/orders/graphql
*** Headers
POST /ebusiness/order/v1/orders/graphql HTTP/1.1
Host: ecom-api.costco.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0
Accept: */*
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br, zstd
costco.service: restOrders
costco.env: ecom
costco-x-authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IlhrZTFoNXg5TV9ZMk5ER0YxU1hDX2xNNnVSTU5tZTJ3STBLRDlHNzl1QmciLCJ0eXAiOiJKV1QifQ.eyJleHAiOjE3NzM2NjU2NjgsIm5iZiI6MTc3MzY2NDc2OCwidmVyIjoiMS4wIiwiaXNzIjoiaHR0cHM6Ly9zaWduaW4uY29zdGNvLmNvbS9lMDcxNGRkNC03ODRkLTQ2ZDYtYTI3OC0zZTI5NTUzNDgzZWIvdjIuMC8iLCJzdWIiOiIzMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJhdWQiOiJhM2E1MTg2Yi03Yzg5LTRiNGMtOTNhOC1kZDYwNGU5MzA3NTciLCJhY3IiOiJCMkNfMUFfU1NPX1dDU19zaWdudXBfc2lnbmluXzIwMSIsIm5vbmNlIjoiNDA4NjU3YmItODg5MC00MTk0LTg2OTctZDYzOGU2MzdhMGRhIiwiaWF0IjoxNzczNjY0NzY4LCJhdXRoX3RpbWUiOjE3NzM2NjQ3NjgsImF1dGhlbnRpY2F0aW9uU291cmNlIjoibG9jYWxBY2NvdW50QXV0aGVudGljYXRpb24iLCJlbWFpbCI6ImpvaG5tb3Nlc2NhcnRlckBnbWFpbC5jb20iLCJuYW1lIjoiRW1wdHkgRGlzcGxheW5hbWUiLCJ1c2VySWRlbnRpdGllcyI6W3siaXNzdWVyIjoiYTNhNTE4NmItN2M4OS00YjRjLTkzYTgtZGQ2MDRlOTMwNzU3IiwiaXNzdWVyVXNlcklkIjoiQUFEOjMxMjNlZDZjLWM3MzgtNGI5OS05MDBkLTE0MTVlNTM2MDZjZSJ9LHsiaXNzdWVyIjoiNDkwMGViMWYtMGMxMC00YmQ5LTk5YzMtYzU5ZTZjMWVjZWJmIiwiaXNzdWVyVXNlcklkIjoiYTZmZmRkOTktNDM2OC00NTgwLTgxOWYtZTZjZjYxM2U1M2M1In0seyJpc3N1ZXIiOiIyZGQ0YjE0NS0zYmRhLTQ2NjktYWU2YS0zN2I4Y2I2ZGFmN2YiLCJpc3N1ZXJVc2VySWQiOiJhNmZmZGQ5OS00MzY4LTQ1ODAtODE5Zi1lNmNmNjEzZTUzYzUifV0sImlzc3VlclVzZXJJZCI6IkFBRDozMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJjbGllbnRJZCI6ImEzYTUxODZiLTdjODktNGI0Yy05M2E4LWRkNjA0ZTkzMDc1NyIsInJlbWVtYmVyTWUiOiJGYWxzZSIsInNlbmRNZUVtYWlsIjoib2ZmIiwiaXBBZGRyZXNzIjoiOTYuMjQxLjIxMi4xMjUiLCJDb3JyZWxhdGlvbklkIjoiYWUyYTMxYjktMjBkNC00MTBkLWE1ZjAtNDJhMWIzM2VmZmQ1In0.gmhhNsgFUbd0QAR1Z_isFjglQxZrM0Kj8yv5-w-FrsWM3d9PB6kWsldBndy6cEhwZh588T1u4vgG9A-XR3HZ4t-JnPZhpr8_7-lI4W4Tp4IAA0tIgMt7cHZUN14qstx_K72QLOrKbO34PQJKBymw2qKvwvhUo372MNFtc2D8_wS_VbG8QdOPumgsBJPqyF7HExt-gpkAu_5kL-54pqLSIZIJZ_viymti9ajla_B8PlvHMO7ZDWSgoV177ArcQAeOhv9MT1e5k0a4V7R-cCI77NIhoBUjV8C4lMAd27nntWzJJ9N00hEEGQb3zPoWUgRFAOdGzjg4xZu1D87C3MJtdA
Content-Type: application/json-patch+json
costco-x-wcs-clientId: 4900eb1f-0c10-4bd9-99c3-c59e6c1ecebf
client-identifier: 481b1aec-aa3b-454b-b81b-48187e28f205
Content-Length: 808
Origin: https://www.costco.com
DNT: 1
Sec-GPC: 1
Connection: keep-alive
Referer: https://www.costco.com/
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-site
*** Request
Request
{"query":"query receiptsWithCounts($startDate: String!, $endDate: String!,$documentType:String!,$documentSubType:String!) {\n receiptsWithCounts(startDate: $startDate, endDate: $endDate,documentType:$documentType,documentSubType:$documentSubType) {\n inWarehouse\n gasStation\n carWash\n gasAndCarWash\n receipts{\n warehouseName receiptType documentType transactionDateTime transactionBarcode warehouseName transactionType total \n totalItemCount\n itemArray { \n itemNumber\n }\n tenderArray { \n tenderTypeCode\n tenderDescription\n amountTender\n }\n couponArray { \n upcnumberCoupon\n } \n }\n}\n }","variables":{"startDate":"1/01/2026","endDate":"3/31/2026","text":"Last 3 Months","documentType":"all","documentSubType":"all"}}
*** Response
{"data":{"receiptsWithCounts":{"inWarehouse":2,"gasStation":0,"carWash":0,"gasAndCarWash":0,"receipts":[{"warehouseName":"MT VERNON","receiptType":"In-Warehouse","documentType":"WarehouseReceiptDetail","transactionDateTime":"2026-03-12T16:16:00","transactionBarcode":"21111500804012603121616","transactionType":"Sales","total":208.58,"totalItemCount":24,"itemArray":[{"itemNumber":"34779"},{"itemNumber":"7950"},{"itemNumber":"2005"},{"itemNumber":"1941976"},{"itemNumber":"4873222"},{"itemNumber":"374664"},{"itemNumber":"60357"},{"itemNumber":"30669"},{"itemNumber":"1025795"},{"itemNumber":"787876"},{"itemNumber":"22093"},{"itemNumber":"1956177"},{"itemNumber":"1136340"},{"itemNumber":"7609681"},{"itemNumber":"18001"},{"itemNumber":"27003"},{"itemNumber":"1886266"},{"itemNumber":"4102"},{"itemNumber":"87745"},{"itemNumber":"110784"},{"itemNumber":"47492"},{"itemNumber":"2287780"},{"itemNumber":"917546"},{"itemNumber":"1768123"},{"itemNumber":"374558"}],"tenderArray":[{"tenderTypeCode":"061","tenderDescription":"VISA","amountTender":208.58}],"couponArray":[{"upcnumberCoupon":"2100003746641"},{"upcnumberCoupon":"2100003745583"}]},{"warehouseName":"MT VERNON","receiptType":"In-Warehouse","documentType":"WarehouseReceiptDetail","transactionDateTime":"2026-02-14T16:25:00","transactionBarcode":"21111500503322602141625","transactionType":"Sales","total":188.12,"totalItemCount":23,"itemArray":[{"itemNumber":"7812"},{"itemNumber":"7950"},{"itemNumber":"3923"},{"itemNumber":"19813"},{"itemNumber":"87745"},{"itemNumber":"1116038"},{"itemNumber":"5938"},{"itemNumber":"1136340"},{"itemNumber":"30669"},{"itemNumber":"384962"},{"itemNumber":"1331732"},{"itemNumber":"787876"},{"itemNumber":"61576"},{"itemNumber":"110784"},{"itemNumber":"180973"},{"itemNumber":"3"},{"itemNumber":"744361"},{"itemNumber":"1886266"},{"itemNumber":"1025795"},{"itemNumber":"11545"},{"itemNumber":"47492"},{"itemNumber":"260509"}],"tenderArray":[{"tenderTypeCode":"061","tenderDescription":"VISA","amountTender":188.12}],"couponArray":[]}]}}}
** item
*** POST
https://ecom-api.costco.com/ebusiness/order/v1/orders/graphql
*** headers
POST /ebusiness/order/v1/orders/graphql HTTP/2
Host: ecom-api.costco.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0
Accept: */*
Accept-Language: en-US,en;q=0.9
Accept-Encoding: gzip, deflate, br, zstd
costco.service: restOrders
costco.env: ecom
costco-x-authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IlhrZTFoNXg5TV9ZMk5ER0YxU1hDX2xNNnVSTU5tZTJ3STBLRDlHNzl1QmciLCJ0eXAiOiJKV1QifQ.eyJleHAiOjE3NzM2NjUzODUsIm5iZiI6MTc3MzY2NDQ4NSwidmVyIjoiMS4wIiwiaXNzIjoiaHR0cHM6Ly9zaWduaW4uY29zdGNvLmNvbS9lMDcxNGRkNC03ODRkLTQ2ZDYtYTI3OC0zZTI5NTUzNDgzZWIvdjIuMC8iLCJzdWIiOiIzMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJhdWQiOiJhM2E1MTg2Yi03Yzg5LTRiNGMtOTNhOC1kZDYwNGU5MzA3NTciLCJhY3IiOiJCMkNfMUFfU1NPX1dDU19zaWdudXBfc2lnbmluXzIwMSIsIm5vbmNlIjoiNzg5MjIzOGUtOWU3NC00MzExLWI2NDItMzQ1NTY4ZDY3NTk4IiwiaWF0IjoxNzczNjY0NDg1LCJhdXRoX3RpbWUiOjE3NzM2NjQ0ODQsImF1dGhlbnRpY2F0aW9uU291cmNlIjoibG9jYWxBY2NvdW50QXV0aGVudGljYXRpb24iLCJlbWFpbCI6ImpvaG5tb3Nlc2NhcnRlckBnbWFpbC5jb20iLCJuYW1lIjoiRW1wdHkgRGlzcGxheW5hbWUiLCJ1c2VySWRlbnRpdGllcyI6W3siaXNzdWVyIjoiYTNhNTE4NmItN2M4OS00YjRjLTkzYTgtZGQ2MDRlOTMwNzU3IiwiaXNzdWVyVXNlcklkIjoiQUFEOjMxMjNlZDZjLWM3MzgtNGI5OS05MDBkLTE0MTVlNTM2MDZjZSJ9LHsiaXNzdWVyIjoiNDkwMGViMWYtMGMxMC00YmQ5LTk5YzMtYzU5ZTZjMWVjZWJmIiwiaXNzdWVyVXNlcklkIjoiYTZmZmRkOTktNDM2OC00NTgwLTgxOWYtZTZjZjYxM2U1M2M1In0seyJpc3N1ZXIiOiIyZGQ0YjE0NS0zYmRhLTQ2NjktYWU2YS0zN2I4Y2I2ZGFmN2YiLCJpc3N1ZXJVc2VySWQiOiJhNmZmZGQ5OS00MzY4LTQ1ODAtODE5Zi1lNmNmNjEzZTUzYzUifV0sImlzc3VlclVzZXJJZCI6IkFBRDozMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJjbGllbnRJZCI6ImEzYTUxODZiLTdjODktNGI0Yy05M2E4LWRkNjA0ZTkzMDc1NyIsInJlbWVtYmVyTWUiOiJGYWxzZSIsInNlbmRNZUVtYWlsIjoib2ZmIiwiaXBBZGRyZXNzIjoiOTYuMjQxLjIxMi4xMjUiLCJDb3JyZWxhdGlvbklkIjoiMDk0YTE5NDYtZTMwNS00ZDkzLWEyMzQtM2ZiNGMwMjMyNDhhIn0.FdsVFHsewvpQABvkEz4uA0NUlYwvlBEg-frJbUDIJRTsP59Be0bOt8Zqv6cZhUqBn_lTQEyi9tnvpkpycmNy7Rg5zLfYroH6mNALRqkBm8VbcmrEVDM1HmdNTHgO9vQD4TdKm1ZYkA7Pj_6QY3sDxI4ioOzIz1_XOnoJVAXjEwGfr8hgvqtlaC51M5DsfIGQj3zCaJrQnD6GBJlFmLNUpCulpT16WAaB1lT_pcycfBs-e1xnEd33dX0kHBOZ8pFS-IKjV_44ZK9R8jI9WHx5ThX3-DtyqjkJ0JypmhT9uEa0MeT55U7aeKPbMvQ0exiw3culKgiWDhvdp8e2EkExsg
Content-Type: application/json-patch+json
costco-x-wcs-clientId: 4900eb1f-0c10-4bd9-99c3-c59e6c1ecebf
client-identifier: 481b1aec-aa3b-454b-b81b-48187e28f205
Content-Length: 2916
Origin: https://www.costco.com
DNT: 1
Sec-GPC: 1
Connection: keep-alive
Referer: https://www.costco.com/
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-site
Priority: u=0
TE: trailers
*** request
{"query":"query receiptsWithCounts($barcode: String!,$documentType:String!) {\n receiptsWithCounts(barcode: $barcode,documentType:$documentType) {\nreceipts{\n warehouseName\n receiptType \n documentType \n transactionDateTime \n transactionDate \n companyNumber \n warehouseNumber \n operatorNumber \n warehouseName \n warehouseShortName \n registerNumber \n transactionNumber \n transactionType\n transactionBarcode \n total \n warehouseAddress1 \n warehouseAddress2 \n warehouseCity \n warehouseState \n warehouseCountry \n warehousePostalCode\n totalItemCount \n subTotal \n taxes\n total \n invoiceNumber\n sequenceNumber\n itemArray { \n itemNumber \n itemDescription01 \n frenchItemDescription1 \n itemDescription02 \n frenchItemDescription2 \n itemIdentifier \n itemDepartmentNumber\n unit \n amount \n taxFlag \n merchantID \n entryMethod\n transDepartmentNumber\n fuelUnitQuantity\n fuelGradeCode\n fuelUnitQuantity\n itemUnitPriceAmount\n fuelUomCode\n fuelUomDescription\n fuelUomDescriptionFr\n fuelGradeDescription\n fuelGradeDescriptionFr\n\n } \n tenderArray { \n tenderTypeCode\n tenderSubTypeCode\n tenderDescription \n amountTender \n displayAccountNumber \n sequenceNumber \n approvalNumber \n responseCode \n tenderTypeName \n transactionID \n merchantID \n entryMethod\n tenderAcctTxnNumber \n tenderAuthorizationCode \n tenderTypeName\n tenderTypeNameFr\n tenderEntryMethodDescription\n walletType\n walletId\n storedValueBucket\n } \n subTaxes { \n tax1 \n tax2 \n tax3 \n tax4 \n aTaxPercent \n aTaxLegend \n aTaxAmount\n aTaxPrintCode\n aTaxPrintCodeFR \n aTaxIdentifierCode \n bTaxPercent \n bTaxLegend \n bTaxAmount\n bTaxPrintCode\n bTaxPrintCodeFR \n bTaxIdentifierCode \n cTaxPercent \n cTaxLegend \n cTaxAmount\n cTaxIdentifierCode \n dTaxPercent \n dTaxLegend \n dTaxAmount\n dTaxPrintCode\n dTaxPrintCodeFR \n dTaxIdentifierCode\n uTaxLegend\n uTaxAmount\n uTaxableAmount\n } \n instantSavings \n membershipNumber \n }\n }\n }","variables":{"barcode":"21111500804012603121616","documentType":"warehouse"}}
*** response
{"data":{"receiptsWithCounts":{"receipts":[{"warehouseName":"MT VERNON","receiptType":"In-Warehouse","documentType":"WarehouseReceiptDetail","transactionDateTime":"2026-03-12T16:16:00","transactionDate":"2026-03-12","companyNumber":1,"warehouseNumber":1115,"operatorNumber":43,"warehouseShortName":"MT VERNON","registerNumber":8,"transactionNumber":401,"transactionType":"Sales","transactionBarcode":"21111500804012603121616","total":208.58,"warehouseAddress1":"7940 RICHMOND HWY","warehouseAddress2":null,"warehouseCity":"ALEXANDRIA","warehouseState":"VA","warehouseCountry":"US","warehousePostalCode":"22306","totalItemCount":24,"subTotal":202.01,"taxes":6.57,"invoiceNumber":null,"sequenceNumber":null,"itemArray":[{"itemNumber":"34779","itemDescription01":"ROMANO","frenchItemDescription1":null,"itemDescription02":"CS=15 SL120 T9H6","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":19,"unit":1,"amount":20.93,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":19,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":11.69,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"7950","itemDescription01":"4LB COSMIC","frenchItemDescription1":null,"itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":5.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"2005","itemDescription01":"25# FLOUR","frenchItemDescription1":null,"itemDescription02":"ALL-PURPOSE HARV P98/100","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":9.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1941976","itemDescription01":"BREAD FLOUR","frenchItemDescription1":null,"itemDescription02":"12 LBS 180P 20X9","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":9.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"4873222","itemDescription01":"ALL F&C","frenchItemDescription1":null,"itemDescription02":"200OZ 160LOADS P104","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":14,"unit":1,"amount":19.99,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":14,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":19.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"374664","itemDescription01":"/ 4873222","frenchItemDescription1":"/4873222","itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":14,"unit":-1,"amount":-5,"taxFlag":null,"merchantID":null,"entryMethod":null,"transDepartmentNumber":14,"fuelUnitQuantity":null,"fuelGradeCode":null,"itemUnitPriceAmount":0,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"60357","itemDescription01":"MIXED PEPPER","frenchItemDescription1":null,"itemDescription02":"6-PACK","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":7.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":7.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"30669","itemDescription01":"BANANAS","frenchItemDescription1":null,"itemDescription02":"3 LB / 1.36 KG","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":2,"amount":2.98,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":1.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1025795","itemDescription01":"KS 5DZ EGGS","frenchItemDescription1":null,"itemDescription02":"SL21 P120 / P132 / P144","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":17,"unit":1,"amount":9.39,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":17,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.39,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"787876","itemDescription01":"KS TWNY PORT","frenchItemDescription1":null,"itemDescription02":"PORTUGAL CSPC# 773506","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":16,"unit":1,"amount":17.99,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":16,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":17.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"22093","itemDescription01":"KS SHRP CHDR","frenchItemDescription1":null,"itemDescription02":"EC20T9H5 W12T13H5 SL130","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":17,"unit":1,"amount":5.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":17,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1956177","itemDescription01":"BRWNBTTRGRV","frenchItemDescription1":null,"itemDescription02":"MCCORMICK C12T19H7 L228","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":2.97,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":2.97,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1136340","itemDescription01":"3LB ORG GALA","frenchItemDescription1":null,"itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":4.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":4.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"7609681","itemDescription01":"CASCADE GEL","frenchItemDescription1":null,"itemDescription02":"125OZ T60H3P180","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":14,"unit":1,"amount":12.49,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":14,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":12.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"18001","itemDescription01":"TBLE SALT 4#","frenchItemDescription1":null,"itemDescription02":"DIAMOND CRYSTAL P=600","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":1.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":1.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"27003","itemDescription01":"STRAWBERRIES","frenchItemDescription1":null,"itemDescription02":"908 G / 2 LB","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":5.29,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.29,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1886266","itemDescription01":"SKO 5%","frenchItemDescription1":null,"itemDescription02":"48 OZ T10H8 SL30","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":17,"unit":1,"amount":5.79,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":17,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.79,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"4102","itemDescription01":"8\" TORTILLAS","frenchItemDescription1":null,"itemDescription02":"SL10 70OZ","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":5.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"87745","itemDescription01":"ROTISSERIE","frenchItemDescription1":null,"itemDescription02":"USDA GRADE A","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":63,"unit":1,"amount":4.99,"taxFlag":"D","merchantID":null,"entryMethod":null,"transDepartmentNumber":63,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":4.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"110784","itemDescription01":"15 GRAIN BRD","frenchItemDescription1":null,"itemDescription02":"PEPPERIDGE FARM 2/24 OZ","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":5.69,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.69,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"47492","itemDescription01":"CELERY SALAD","frenchItemDescription1":null,"itemDescription02":"APPLE CIDER VINAIGRETTE","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":63,"unit":1,"amount":12.62,"taxFlag":"D","merchantID":null,"entryMethod":null,"transDepartmentNumber":63,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":4.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"2287780","itemDescription01":"BTB CHICKEN","frenchItemDescription1":null,"itemDescription02":"C12T10H9 P1080 SL630","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":9.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"917546","itemDescription01":"JIF CREAMY","frenchItemDescription1":null,"itemDescription02":"PEANUT BUTTER SL540 P300","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":11.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":11.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1768123","itemDescription01":"BBEE KIDS4PC","frenchItemDescription1":null,"itemDescription02":"FY26 P1600 T200 H8","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":39,"unit":1,"amount":17.99,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":39,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":17.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"374558","itemDescription01":"/ 1768123","frenchItemDescription1":"/1768123","itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":39,"unit":-1,"amount":-4,"taxFlag":null,"merchantID":null,"entryMethod":null,"transDepartmentNumber":39,"fuelUnitQuantity":null,"fuelGradeCode":null,"itemUnitPriceAmount":0,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null}],"tenderArray":[{"tenderTypeCode":"061","tenderSubTypeCode":null,"tenderDescription":"VISA","amountTender":208.58,"displayAccountNumber":"9070","sequenceNumber":null,"approvalNumber":null,"responseCode":null,"tenderTypeName":"VISA","transactionID":null,"merchantID":null,"entryMethod":null,"tenderAcctTxnNumber":null,"tenderAuthorizationCode":null,"tenderTypeNameFr":null,"tenderEntryMethodDescription":null,"walletType":null,"walletId":null,"storedValueBucket":null}],"subTaxes":{"tax1":null,"tax2":null,"tax3":null,"tax4":null,"aTaxPercent":null,"aTaxLegend":"A","aTaxAmount":4.62,"aTaxPrintCode":null,"aTaxPrintCodeFR":null,"aTaxIdentifierCode":null,"bTaxPercent":null,"bTaxLegend":null,"bTaxAmount":null,"bTaxPrintCode":null,"bTaxPrintCodeFR":null,"bTaxIdentifierCode":null,"cTaxPercent":null,"cTaxLegend":"C","cTaxAmount":1.25,"cTaxIdentifierCode":null,"dTaxPercent":null,"dTaxLegend":"D","dTaxAmount":0.7,"dTaxPrintCode":null,"dTaxPrintCodeFR":null,"dTaxIdentifierCode":null,"uTaxLegend":null,"uTaxAmount":null,"uTaxableAmount":null},"instantSavings":9,"membershipNumber":"111894291684"}]}}}
** online order
appears to be same endpoint.
request:
{"query":"query getOnlineOrders($startDate:String!, $endDate:String!, $pageNumber:Int , $pageSize:Int, $warehouseNumber:String! ){\n getOnlineOrders(startDate:$startDate, endDate:$endDate, pageNumber : $pageNumber, pageSize : $pageSize, warehouseNumber : $warehouseNumber) {\n pageNumber\n pageSize\n totalNumberOfRecords\n bcOrders {\n orderHeaderId\n orderPlacedDate : orderedDate\n orderNumber : sourceOrderNumber \n orderTotal\n warehouseNumber\n status\n emailAddress\n orderCancelAllowed\n orderPaymentFailed : orderPaymentEditAllowed\n orderReturnAllowed\n orderLineItems {\n orderLineItemCancelAllowed\n \torderLineItemId\n orderReturnAllowed\n itemId\n itemNumber\n itemTypeId\n lineNumber\n itemDescription\n deliveryDate\n warehouseNumber\n status\n orderStatus\n parentOrderLineItemId\n isFSAEligible\n shippingType\n shippingTimeFrame\n isShipToWarehouse\n carrierItemCategory\n carrierContactPhone\n programTypeId\n isBuyAgainEligible\n scheduledDeliveryDate\n scheduledDeliveryDateEnd\n configuredItemData\n shipment {\n shipmentId \n orderHeaderId\n orderShipToId \n lineNumber \n orderNumber\n shippingType \n shippingTimeFrame \n shippedDate \n packageNumber \n trackingNumber \n trackingSiteUrl \n carrierName \n estimatedArrivalDate \n deliveredDate \n isDeliveryDelayed \n isEstimatedArrivalDateEligible \n statusTypeId \n status \n pickUpReadyDate\n pickUpCompletedDate\n reasonCode\n trackingEvent {\n event\n carrierName\n eventDate\n estimatedDeliveryDate\n scheduledDeliveryDate\n trackingNumber\n }\n }\n }\n }\n }\n }","variables":{"pageNumber":1,"pageSize":10,"startDate":"2026-1-01","endDate":"2026-3-31","warehouseNumber":"847"}}
* need new scope changes:
- pull all orders by default
- add online orders
- copy header data from browser using selenium
* how to run
python scrape_giant.py
python enrich_giant.py
python scrape_costco.py
python enrich_costco.py
python build_observed_products.py
python build_review_queue.py
python build_canonical_layer.py
python validate_cross_retailer_flow.py
* t1.13 tasks [2026-03-17 Tue 13:49]
ok i ran a few. time to run some cleanups here - i'm wondering if we shouldn't be less aggressive with canonical names and encourage a better manual process to start.
** TODO fill in auto-created canonical category, product-type
auto-created canonical_names lack category, product_type - ok with filling these in manually in the catalog once the queue is empty
** TODO consolidation cleanup
1. canonical_names feel too specific, e.g., "5DZ egg" - probably a problem with the enrich_* steps not adding appropraite normalizing data /and/ removing from observed product title?
2. some canonical_names need consolidation, eg "LIME" and "LIME . / ." ; poss cleanup issue. there are 5 entries for ergg but but they are all regular large grade A white eggs, just different amounts in dozens.
Eggs are actually a great candidate for the kind of analysis we want to do - the pipeline should have caught and properly sorted these into size/qty:
#+begin_example
```canonical_product_id canonical_name category product_type brand variant size_value size_unit pack_qty measure_type notes created_at updated_at
gcan_0e350505fd22 5DZ EGG / / KS each auto-linked via exact_name
gcan_47279a80f5f3 EGG 5 DOZ. BBS each auto-linked via exact_name
gcan_7d099130c1bf LRG WHITE EGG SB 30 count auto-linked via exact_upc
gcan_849c2817e667 GDA LRG WHITE EGG SB 18 count auto-linked via exact_upc
gcan_cb0c6c8cf480 LG EGG CONVENTIONAL 18 count count auto-linked via exact_name_size ```
#+end_example
** TODO costco discount matching
Build costco mechanism for matching discount to line item.
1. Discounts appear as their own line items with a number like /123456, this matches the UPC of the discounted item
2. must be date-matched to the UPC
Data model might be missing shape:
1. match discount rows like `item_name:/2303476` to `retailer_item_id:2303476`
2. display this value on the item somehow? maybe update line_total? otherwise we lose fidelity. should be stored in items_enriched somehow
#+begin_example
```retailer order_id line_no observed_item_key order_date retailer_item_id pod_id item_name upc category_id category qty unit unit_price line_total picked_weight mvp_savings reward_savings coupon_savings coupon_price image_url raw_order_path item_name_norm brand_guess variant size_value size_unit pack_qty measure_type is_store_brand is_fee is_discount_line is_coupon_line price_per_each price_per_lb price_per_oz parse_version parse_notes
costco 2.11115E+22 3 costco:21111520101942404241753:3 4/24/2024 2303476 KA 6QT MIXER P16 KSM60SECXER/CU FY23 33 33 1 None 399.99 399.99 costco_output/raw/21111520101942404241753-2024-04-24T17-53-00.json KA 6QT MIXER KSM60SECXER/CU each FALSE FALSE FALSE FALSE 399.99 costco-enrich-v1
costco 2.11115E+22 4 costco:21111520101942404241753:4 4/24/2024 325173 /2303476 33 33 -1 None 0 -100 -100 costco_output/raw/21111520101942404241753-2024-04-24T17-53-00.json /2303476 each FALSE FALSE TRUE TRUE 100 costco-enrich-v1 ```
#+end_example
** TODO giant discount matching
* prompt
do not add new abstractions unless they remove real duplication. prefer explicit retailer-specific logic over generic heuristics. do not auto-create new canonical products from weak normalized names.
and propose the smallest set of edits needed.
* 1.13 fixes
** 15x Costco discounts not caught
- 15x, some with slash-space: `/ 1768123`and some without: `/2303476`
** canonical names suck - tempted to force manual config from scratch?
- maybe first-pass should be naming groups, starting with largest groups and going on down.
- unfortunately not seeing many cross-retailer items? looks like costco-only; just taking Giant as gospel
- could be as simple as changing canonical name in canonical_catalog.csv
- tough to figure out where the data is, leading to below:
** need to refactor whole flow and where data is stored
group by browser or by site, or both? currently mixed.
1. Scrape
- Script:
- Output: /output/raw/orderN.json, history.json, orders.csv, history.csv
2. Enrich
- Scripts:
- Output: /output/enrich/items.json
3. Combined - /output/?
- Review step?
** propsed fixes
* 1.14 prep - OBE
** [ ] t1.14.1 define and document the filesystem/data-layer layout (2-3 commits)
make stage ownership and retailer ownership explicit so every artifact has one obvious home
** AC
1. define and document the canonical directory layout for the pipeline, separating retailer-specific artifacts from shared combined artifacts
2. adopt an explicit layout of the form:
- `data/<retailer>/raw/`
- `data/<retailer>/orders.csv`
- `data/<retailer>/items.csv`
- `data/<retailer>/items_enriched.csv`
- `data/combined/products_observed.csv`
- `data/combined/review_queue.csv`
- `data/combined/item_aliases.csv`
- `data/combined/canonical_catalog.csv`
- `data/combined/product_links.csv`
- `data/combined/purchases.csv`
- `data/combined/pipeline_status.csv`
- `data/combined/pipeline_status.json`
3. update docs/readme and pipeline docs so each scripts inputs and outputs point to the new layout
4. remove or deprecate ambiguous stage outputs living under a retailer-specific output directory when they are actually shared artifacts
- pm note: goal is “where does this file live?” should have one answer, not three
** evidence
- commit:
- tests:
- date:
** notes
** [ ] t1.14.2 define the row-level data model for raw, enriched, observed, canonical, and purchases layers (2-4 commits)
lock the item model before further refactors so each stage has a clear grain and purpose
** AC
1. document the row grain for each layer:
- raw item row = one receipt line from one retailer order
- enriched item row = one retailer line with retailer-specific parsed fields
- observed product row = one grouped retailer-facing product concept
- canonical catalog row = one review-controlled product identity
- purchase row = one final pivot-ready purchased item line
2. define the required fields for each layer, including stable ids and provenance fields
3. explicitly document which fields are allowed to be blank at each layer (e.g. `upc`, `canonical_item_id`, category)
4. document the relationship between:
- `raw_item_name`
- `normalized_item_name`
- `observed_product_id`
- `canonical_item_id`
5. document how retailer-native ids (e.g. Costco `retailer_item_id`) fit into the shared model without being forced into `upc`
- pm note: this is the schema contract task; code should follow it, not invent it ad hoc
** evidence
- commit:
- tests:
- date:
** notes
** [ ] t1.14.3 refactor pipeline outputs to the new layout without changing semantics (2-4 commits)
move files and script defaults to the new structure while preserving current behavior
** AC
1. update scraper and enrich scripts to write retailer-specific outputs under `data/<retailer>/...`
2. update combined/shared scripts to read from retailer-specific enriched outputs and write to `data/combined/...`
3. preserve current content/meaning of outputs during the move; this is a location/structure refactor, not a behavior rewrite
4. update tests, docs, and script defaults to use the new paths
- pm note: do not mix data-layout cleanup with canonical/review logic changes in this task
** evidence
- commit:
- tests:
- date:
** notes
** [ ] t1.14.4 make the review and catalog layer explicit and authoritative (2-4 commits)
treat review and canonical resolution as first-class data, not incidental byproducts
** AC
1. define `review_queue.csv`, `item_aliases.csv`, and `canonical_catalog.csv` as the authoritative review/catalog files in `data/combined/`
2. document the intended purpose of each:
- `review_queue.csv` = unresolved observed items needing action
- `item_aliases.csv` = approved mapping from observed/normalized names to canonical ids
- `canonical_catalog.csv` = review-controlled canonical product definitions and display names
3. ensure final purchase generation reads from these files as the source of truth for resolution
4. stop relying on weak implicit canonical creation as a substitute for the explicit review/catalog layer
- pm note: this is the control-plane task; observed products may be automatic, canonical products are review-controlled
** evidence
- commit:
- tests:
- date:
** notes
** [ ] t1.14.5 define and document the final pivot-ready purchases output (2-3 commits)
make the final analysis artifact explicit so excel/pivot/chart use is a first-class target
** AC
1. define `data/combined/purchases.csv` as the final normalized purchase log
2. ensure each purchase row retains:
- purchase date
- retailer
- order id
- raw item name
- normalized item name
- canonical item id when resolved
- quantity and unit
- original line total
- discount-adjusted fields when applicable
- store/location fields where available
3. document that `purchases.csv` is the primary excel/pivot input and that earlier files are staging layers
4. document expected pivot uses such as purchase frequency and cost over time by canonical item
- pm note: this task is about making the final artifact explicit and stable, not about adding new metrics
** evidence
- commit:
- tests:
- date:
** notes
* pipeline prep [2026-03-17 Tue]
data saved to /data
1. "scrape_<retailer>" gathers data from a retailer and outputs:
1. raw list of items per visit ./<retailer>/scraped/raw/order-<uid>.json
2. raw list of visits ./<retailer>/scraped_visits.csv
3. raw list of items from all visits ./<retailer>/scraped_items.csv
2. "enrich <retailer>" takes /scraped/ data and outputs:
1. normalized list of items ./<retailer>/enriched_items.csv
3. "combine" takes retailer
input:
1. all enriched items ./<retailer>/enriched_items.csv
2. all retailer visits ./<retailer>/scraped_visits.csv
outputs:
1. observed product groups ./combined/observed/products_observed.csv
2. unresolved products for review ./combined/review/review_queue.csv
3. pipeline accounting/status ./combined/status/pipeline_status.csv
4. pipeline accounting/status ./combined/status/pipeline_status.json
4. review resolves unknown or weakly identified products and maintains:
1. canonical product catalog ./combined/review/canonical_catalog.csv
2. approved alias mappings ./combined/review/item_aliases.csv
3. optional observed→canonical links ./combined/review/product_links.csv
5. build purchases takes combined observed data plus review/catalog data and outputs:
[1]. final normalized purchase log ./combined/purchases/purchases.csv
lets get this pipeline right before more refactoring.
* Pipeline - moved to data-model.org [2026-03-18 Wed]
Key:
- (1) input
- [2] output
Each step can be run alone if its dependents exist.
** 1. Collect
Get raw receipt/visit and item data from a retailer. Scraping is unique to a Retailer and method (e.g., Giant-Web and Giant-Scan). Preserve complete raw data and preserve fidelity. Avoid interpretation beyond basic data flattening.
- (1) Source access (Varies, eg header data, auth for API access)
- [1] collected visits from each retailer
- [2] collected items from each retailer
- [3] any other raw data that supports [1] and [2]; explicit source (eventual receipt scan?)
** 2. Normalize
Parse and extract structured facts from retailer-specific raw data to create a standardized item format. Strictly dependent on Collect method and output.
- Extract quantity, size, pack, pricing, variant
- Consolidate discount with item using upc/retail_item_id and concurrence
- Cleanup naming to facilitate later matching
- (1) collected items from each retailer
- (2) collected visits from each retailer
- [1] normalized items from each retailer
** 3. Review/Combine (Canonicalization)
Decide whether two normalized retailer items are "the same product"; match items across retailers using algo/logic and human review. Create catalog linked to normalized items.
- Grouping the same item from retailer
- Asking human to create a canonical/catalog item with:
- friendly/canonical_name: "bell pepper"; "milk"
- category: "produce"; "dairy"
- product_type: "pepper"; "milk"
- ? variant? "whole, "skim", "2pct"
- (1) normalized items from each retailer
- [1] review queue of items to be reviewed
- [2] catalog (lookup table) of confirmed retailer_item and canonical_name
- [3] canonical purchase list, pivot-ready
** Unresolved Issues
2. Create tags: canonical_name (need better label), category, product_type is missing data like Variant, shouldn't this be part of the normalization step?
3. need central script to orchestrate; metadata belongs here and nowhere else
** Symptoms
- `LIME` and `LIME . / .` appearing in canonical_catalog:
- names must come from review-approved names, not raw strings
* notes
** to fix
- options not reading/sticking?
- ice cream - add flavor, call it frozen (not dairy)
- seltzer/soda from "seltzer,soda,bev" to "cherry san pellegrino, seltzer, bev"?
- [1] chicken bouillon, soup, (0 items, 0 rows) -> chicken bouillon, broth?, ,
- peanut butter,, -> creamy peanut butter, peanut butter, condiment
- add gummy bear to candy
- add "fresh" to fresh strawberry
- fix "onion,veg,produce"
manage product_type and category directly?
future: fix match
*** Done
fuji apple, apple, produce (not apple, fruit, produce)
spinach, , produce -> frozen vs fresh?
frozen chicken thighs ->
rotisserie chicken, chicken, poultry -> rotisserie chicken, chicken, meat
beef patty, hamburger, meat -> hamburger patty, beef, meat
oats > cereal
cheerios > cereal
- 3 kinds of greek yogurt!!
** takeaways
- variants not caught, how to fix?
catalog_name = what you actually bought
product_type = reasonable substitute
category = store aisle
Using different categories maintains a direct comparison (product_type==spinach) and a distinction.
fresh spinach, spinach, produce
frozen spinach, spinach, frozen
include in catalog_name:
- form: frozen, fresh, ground, shredded
- fat level: whole, skim, 2%
- flavor when primary: vanilla yogurt vs plain yogurt
- cut: diced tomatoes vs crushed tomatoes
- species when relevant: gala apple vs fuji apple
exclude from catalog_name:
- package size / multipack count
- promo wording; adjectives like "premium"; retailer marketing fluff
** AC
1. fix internal search flow, add same menu
#+begin_src diff
Review 4/345: SHRP CHDR
5 matched items:
[1] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2026-03-12 | 5.49 |
[2] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2025-01-24 | 12.58 |
[3] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2025-01-10 | 6.29 |
[4] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2024-12-14 | 6.29 |
[5] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2024-08-06 | 5.99 |
no catalog_name suggestions found
[f]ind [n]ew [s]kip e[x]clude [q]uit >
f
search: cheddar
1 search results found:
[1] cheddar cheese, cheese, dairy (0 items, 0 rows)
- selection: 1
+ [#] link to suggestion [f]ind [n]ew [s]kip e[x]clude [q]uit >
#+end_src
instead of
#+begin_src diff
search: banana
no matches found
- search again? [enter=yes, q=no]:
+ [f]ind [n]ew [s]kip e[x]clude [q]uit >
#+end_src
2. during a long review session, two pepper or onion types back-to-back cant see the one i just added
- suggest just-added catalog items
- script likely needs to re-read the csv, not just add
//3. suggest based on both catalog & product_name (this is already happening//
3. Search results do not properly list running totals:
5 search results found:
[1] red onion, onion, produce (0 items, 0 rows)
[2] mild roasted red bell pepper, bell pepper, produce (0 items, 0 rows)
[3] onion, vegetable, produce (0 items, 0 rows)
[4] sour cream and onion potato chip, chips, snack (0 items, 0 rows)
[5] yellow onion, onion, produce (0 items, 0 rows)
selection:
* data cleanup [2026-03-23 Mon]
ok we're getting closer. still see some issues
1. reorder purchases columns for display: catalog_name, product_type, category (makes data/troubleshooting way easier)
2. shouldn't net_line_price should never be empty? to allow cumulative cost comparison/analysis (we can see normalized price per X via effective_price but shouldnt this be weighted against how much we bought? eg if we bought 5lb flour at $0.970/lb this is weighted as 1-to-1 with a 25lb purchase as 0.670/lb
3. some items missing entire categorizations? probably a result of me trying to do data cleanup. i found the orphaned values in teh product_links table and removed them, but re-running review_products.py did not catch this...
shouldn't review_products run a comparison between each vendor's normalized_items and compare to the existing review_queu?
RSET POTATO US 1
GREEK YOGURT DOM55
FDLY CHY VAN IC CRM
DUNKIN DONUT CANISTER ORIG BLND P=260
ICE CUBES
BLACK BEANS
KETCHUP SQUEEZE BTL
YELLOW_GOLD POTATO US 1
YELLOW_GOLD POTATO US 1
PINTO BEANS
4. cleanup deprecated .py files
5. Goals:
1. When have I purchased this item, what did I pay, and how has the price changed over time?
- we're close, but missing units - eg AP flour shows a value that looks like price/lb but you just see $0.765
- doesnt seem like we've captured everything but that's just a gut feeling
2. Visit breakdown as well as catalog/product/category? this certainly belongs in purchases.csv.
3. Consider dash/plotly for better-than-excel tracking, since we're really only looking at a couple of graphs and filtering within certain values? (obv keep purchases as a user-friendly output)
** 1. Cleanup purchases column order
purchase_date
retailer
catalog_name
product_type
category
net_line_total
normalized_quantity
effective_price
effective_price_unit (new)
order_id
line_no
raw_item_name
normalized_item_name
catalog_id
normalized_item_id
** 2. Populate and use purchases.net_line_total
net_line_total = line_total+matched_discount_amoun
effective_price = net_line_total / normalized_quantity
weighted cost analysis uses net_line_total, not just avg effective_price
** 3. Improve review robustness, enable norm_item re review
1. should regenerate candidates from:
- normalized items with no valid catalog_id
- normalized items whose linked catalog_id no longer exists
- normalized items whose linked catalog row exists but missing required fields if you want completeness review
2. review_products.py should compare:
- current normalized universe
- current product_links
- current catalog
- current review_queue
** 4. Remove deprecated.py
** 5. Improve Charts
1. Histogram: add effective_price_unit to purchases.py
1. Visits: plot by order_id enable display of:
1. spend by visit
2. items per visit
3. category spend by visit
4. retailer/store breakdown
* /