53 KiB
- python setup
- git issues
- giant requests
- costco requests
- need new scope changes:
- how to run
- t1.13 tasks [2026-03-17 Tue 13:49]
- prompt
- 1.13 fixes
- 1.14 prep - OBE
- [ ] t1.14.1 define and document the filesystem/data-layer layout (2-3 commits)
- AC
- evidence
- notes
- [ ] t1.14.2 define the row-level data model for raw, enriched, observed, canonical, and purchases layers (2-4 commits)
- AC
- evidence
- notes
- [ ] t1.14.3 refactor pipeline outputs to the new layout without changing semantics (2-4 commits)
- AC
- evidence
- notes
- [ ] t1.14.4 make the review and catalog layer explicit and authoritative (2-4 commits)
- AC
- evidence
- notes
- [ ] t1.14.5 define and document the final pivot-ready purchases output (2-3 commits)
- AC
- evidence
- notes
- pipeline prep [2026-03-17 Tue]
- Pipeline - moved to data-model.org [2026-03-18 Wed]
- notes
- data cleanup [2026-03-23 Mon]
- /
python setup
venv install playwright, pandas playwright install
- scrape - raw giant json
- enrich - cols:
item_name_norm brand_guess size_value size_unit pack_qty variant is_store_brand is_fee measure_type price_per_lb price_per_oz price_per_each image_url
normalize abbreviationsta extract size like 12z, 10ct, 5lb detect fees like bag charges infer whether something is sold by each vs weight carry forward image url
- build observed-product atble from enriched items
git issues
- dont try to git push from win emacs viewing wsl, it will be screwy (windows identity vs wsl)
ssh / access to gitea
ssh://git@192.168.1.207:2020/ben/scrape-giant.git https://git.hgsky.me/ben/scrape-giant.git
add Port to config: Host gitea HostName 192.168.1.207 User git Port 2020 IdentityFile ~/.ssh/gitea IdentitiesOnly yes
then git remote set-url gitea git@gitea:ben/scrape-giant.git
on local network: use ssh to 192.168.1.207:2020 from elsewhere/public: use https to git.hgsky.me/… unless you later expose ssh properly
stash
z z to stash local work only take care not to add ignored files which will add the venv and `__pycache__`
z p to pop the stash back
creating remote branches
P p, magit will suggest upstream (gitea), select and Enter and it will be created
cherry-picking
b b : switch to desired branch (review) l B : open reflog for local branches (my changes were committed to local cx but not pushed to gitea/cx) put point on the commit you want; did this in sequence A A : cherry pick commit to current branch minibuffer will show the commit and all branches, leave it on that commit the final commit was not shown by hash, just the branch cx since (local) cx was caught up with that branch
reverting a branch
b l : switch to local branch (cx) l l : open local reflog put point on the commit; highlighted remote gitea/cx X : reset branch; prompts you, selected cx
merge branch
b b : switch to branch to be merged into (cx) m m : pick branch to merge into current branch
giant requests
item:
get: /api/v6.0/user/369513017/order/history/detail/69a2e44a16be1142e74ad3cc
headers: request: GET /api/v6.0/user/369513017/order/history/detail/69a2e44a16be1142e74ad3cc?isInStore=true HTTP/2 Host: giantfood.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0 Accept: application/json, text/plain, / Accept-Language: en-US,en;q=0.9 Accept-Encoding: gzip, deflate, br, zstd DNT: 1 Sec-GPC: 1 Connection: keep-alive Referer: https://giantfood.com/account/history/invoice/in-store Cookie: datadome=rDtvd3J2hO5AeghJMSFRRxGc6ifKCQYgMLcqPNr9rWiz2rdcXb032AY6GIZn8tUmYB96BKKbzh3_jSjEzYWLj8hDjl3oGYYAiu4jwdaxpf3vh2v4f7KH7kbqgsMWpkjt; cf_clearance=WEPyQokx9f0qoyS4Svsw4EkZ1TYOxjOwcUHspT3.rXw-1773348940-1.2.1.1-fPvERGxBlFUaBW83sUppbUWpwvFG7mZivag5vBvZb3kxUQv2WSVIV1tON0HV2n8bkVY0U8_BBl62a00Np.oJylYQcGME540gZlYEoL.gMs4WynLqApFe5BOXAEwOm01_6h6b62H90bl4ypRehVb_TXEi4qHaPLVSZhjZK_h.fv6RBqjgYch2j_8XnHe5HXvLziVjl1k2aJskozqy04KOyeHyc3OyIPTZd5On_KAzFIM; dvrctk=MnjKJVShVraEtbrBkkxWxLaZrXnIGNQlwB7QtZVPFeA=; __cflb=0H28vXMLFyydRmDMNgcPHijM6auXkCspCkuh58tVuJ3; __cf_bm=C6QbqiEvbbwdrYBpoJOkcWcedf60vcOfPfTPPbZzKbM-1773348202-1.0.1.1-cSHoYwi8ZjIHTdBItXQP_iXJdRJS6FYjFsGdl1eGHvS5pgfbcT4Lg19P6UStX.bZz1u0OXiS5ykdipPBtwP6OvZr68k4XSmjYpir05jNLhw; _dd_s=rum=0&expire=1773349846445; ppdtk=Uog72CR22mD85C7U4iZHlgOQeRmvHEYp0OdQc+0lEes1c5/LeqGT+ZUlXpSC6FpW; cartId=3820547 Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin Priority: u=0 TE: trailers
response: HTTP/2 200 date: Thu, 12 Mar 2026 20:55:47 GMT content-type: application/json server: cloudflare cf-ray: 9db5b3a5d84aff28-IAD cf-cache-status: DYNAMIC content-encoding: gzip set-cookie: datadome=MXMri0hss6PlQ0_oS7gG2iMdOKnNkbDmGvOxelgN~nCcupgkJQOqjcjcgdprIaI7hSlt_w8E9Ri_RAzPFrGqtUfqAJ_szB_aNZ2FdC26qmI3870Nn4~T0vtx8Gj3dEZR; Max-Age=31536000; Domain=.giantfood.com; Path=/; Secure; SameSite=Lax strict-transport-security: max-age=31536000; includeSubDomains vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers, accept-encoding accept-ch: Sec-CH-UA,Sec-CH-UA-Mobile,Sec-CH-UA-Platform,Sec-CH-UA-Arch,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-Device-Memory x-datadome: protected request-context: appId=cid-v1:75750625-0c81-4f08-9f5d-ce4f73198e54 X-Firefox-Spdy: h2
history:
headers: request: GET /api/v6.0/user/369513017/order/history?filter=instore&loyaltyNumber=440155630880 HTTP/2 Host: giantfood.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0 Accept: application/json, text/plain, / Accept-Language: en-US,en;q=0.9 Accept-Encoding: gzip, deflate, br, zstd DNT: 1 Sec-GPC: 1 Connection: keep-alive Referer: https://giantfood.com/account/history/invoice/in-store Cookie: datadome=OH2XjtCoI6XjE3Qsz_b0F1YULKLatAC0Ea~VMeDGBP0N9Z~CeI3RqEbvkGmNW_VCOU~vRb6p0kqibvF2tLbWnzyAGIdO7jsC41KiYbp7USpJDnefZhIg0e1ypAugvDSw; cf_clearance=WEPyQokx9f0qoyS4Svsw4EkZ1TYOxjOwcUHspT3.rXw-1773348940-1.2.1.1-fPvERGxBlFUaBW83sUppbUWpwvFG7mZivag5vBvZb3kxUQv2WSVIV1tON0HV2n8bkVY0U8_BBl62a00Np.oJylYQcGME540gZlYEoL.gMs4WynLqApFe5BOXAEwOm01_6h6b62H90bl4ypRehVb_TXEi4qHaPLVSZhjZK_h.fv6RBqjgYch2j_8XnHe5HXvLziVjl1k2aJskozqy04KOyeHyc3OyIPTZd5On_KAzFIM; dvrctk=MnjKJVShVraEtbrBkkxWxLaZrXnIGNQlwB7QtZVPFeA=; __cflb=0H28vXMLFyydRmDMNgcPHijM6auXkCspCkuh58tVuJ3; __cf_bm=C6QbqiEvbbwdrYBpoJOkcWcedf60vcOfPfTPPbZzKbM-1773348202-1.0.1.1-cSHoYwi8ZjIHTdBItXQP_iXJdRJS6FYjFsGdl1eGHvS5pgfbcT4Lg19P6UStX.bZz1u0OXiS5ykdipPBtwP6OvZr68k4XSmjYpir05jNLhw; _dd_s=rum=0&expire=1773349842848; ppdtk=Uog72CR22mD85C7U4iZHlgOQeRmvHEYp0OdQc+0lEes1c5/LeqGT+ZUlXpSC6FpW; cartId=3820547 Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin Priority: u=0 TE: trailers
response: HTTP/2 200 date: Thu, 12 Mar 2026 20:55:43 GMT content-type: application/json server: cloudflare cf-ray: 9db5b38f7eebff28-IAD cf-cache-status: DYNAMIC content-encoding: gzip set-cookie: datadome=rDtvd3J2hO5AeghJMSFRRxGc6ifKCQYgMLcqPNr9rWiz2rdcXb032AY6GIZn8tUmYB96BKKbzh3_jSjEzYWLj8hDjl3oGYYAiu4jwdaxpf3vh2v4f7KH7kbqgsMWpkjt; Max-Age=31536000; Domain=.giantfood.com; Path=/; Secure; SameSite=Lax strict-transport-security: max-age=31536000; includeSubDomains vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers, accept-encoding accept-ch: Sec-CH-UA,Sec-CH-UA-Mobile,Sec-CH-UA-Platform,Sec-CH-UA-Arch,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-Device-Memory x-datadome: protected request-context: appId=cid-v1:75750625-0c81-4f08-9f5d-ce4f73198e54 X-Firefox-Spdy: h2
costco requests
- localstorage idToken has the auth token, but needs "Bearer " prepended
- localstorage clientID has the COSTCO_X_WCS_CLIENTID
- I don't see the client_identifier uuid anywhere.
we will pull from .env first (may have to hardcode) then overwrite with session data (token) hopefully this doesnt change.
warehouse
Headers
POST /ebusiness/order/v1/orders/graphql HTTP/1.1 Host: ecom-api.costco.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0 Accept: / Accept-Language: en-US,en;q=0.9 Accept-Encoding: gzip, deflate, br, zstd costco.service: restOrders costco.env: ecom costco-x-authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IlhrZTFoNXg5TV9ZMk5ER0YxU1hDX2xNNnVSTU5tZTJ3STBLRDlHNzl1QmciLCJ0eXAiOiJKV1QifQ.eyJleHAiOjE3NzM2NjU2NjgsIm5iZiI6MTc3MzY2NDc2OCwidmVyIjoiMS4wIiwiaXNzIjoiaHR0cHM6Ly9zaWduaW4uY29zdGNvLmNvbS9lMDcxNGRkNC03ODRkLTQ2ZDYtYTI3OC0zZTI5NTUzNDgzZWIvdjIuMC8iLCJzdWIiOiIzMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJhdWQiOiJhM2E1MTg2Yi03Yzg5LTRiNGMtOTNhOC1kZDYwNGU5MzA3NTciLCJhY3IiOiJCMkNfMUFfU1NPX1dDU19zaWdudXBfc2lnbmluXzIwMSIsIm5vbmNlIjoiNDA4NjU3YmItODg5MC00MTk0LTg2OTctZDYzOGU2MzdhMGRhIiwiaWF0IjoxNzczNjY0NzY4LCJhdXRoX3RpbWUiOjE3NzM2NjQ3NjgsImF1dGhlbnRpY2F0aW9uU291cmNlIjoibG9jYWxBY2NvdW50QXV0aGVudGljYXRpb24iLCJlbWFpbCI6ImpvaG5tb3Nlc2NhcnRlckBnbWFpbC5jb20iLCJuYW1lIjoiRW1wdHkgRGlzcGxheW5hbWUiLCJ1c2VySWRlbnRpdGllcyI6W3siaXNzdWVyIjoiYTNhNTE4NmItN2M4OS00YjRjLTkzYTgtZGQ2MDRlOTMwNzU3IiwiaXNzdWVyVXNlcklkIjoiQUFEOjMxMjNlZDZjLWM3MzgtNGI5OS05MDBkLTE0MTVlNTM2MDZjZSJ9LHsiaXNzdWVyIjoiNDkwMGViMWYtMGMxMC00YmQ5LTk5YzMtYzU5ZTZjMWVjZWJmIiwiaXNzdWVyVXNlcklkIjoiYTZmZmRkOTktNDM2OC00NTgwLTgxOWYtZTZjZjYxM2U1M2M1In0seyJpc3N1ZXIiOiIyZGQ0YjE0NS0zYmRhLTQ2NjktYWU2YS0zN2I4Y2I2ZGFmN2YiLCJpc3N1ZXJVc2VySWQiOiJhNmZmZGQ5OS00MzY4LTQ1ODAtODE5Zi1lNmNmNjEzZTUzYzUifV0sImlzc3VlclVzZXJJZCI6IkFBRDozMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJjbGllbnRJZCI6ImEzYTUxODZiLTdjODktNGI0Yy05M2E4LWRkNjA0ZTkzMDc1NyIsInJlbWVtYmVyTWUiOiJGYWxzZSIsInNlbmRNZUVtYWlsIjoib2ZmIiwiaXBBZGRyZXNzIjoiOTYuMjQxLjIxMi4xMjUiLCJDb3JyZWxhdGlvbklkIjoiYWUyYTMxYjktMjBkNC00MTBkLWE1ZjAtNDJhMWIzM2VmZmQ1In0.gmhhNsgFUbd0QAR1Z_isFjglQxZrM0Kj8yv5-w-FrsWM3d9PB6kWsldBndy6cEhwZh588T1u4vgG9A-XR3HZ4t-JnPZhpr8_7-lI4W4Tp4IAA0tIgMt7cHZUN14qstx_K72QLOrKbO34PQJKBymw2qKvwvhUo372MNFtc2D8_wS_VbG8QdOPumgsBJPqyF7HExt-gpkAu_5kL-54pqLSIZIJZ_viymti9ajla_B8PlvHMO7ZDWSgoV177ArcQAeOhv9MT1e5k0a4V7R-cCI77NIhoBUjV8C4lMAd27nntWzJJ9N00hEEGQb3zPoWUgRFAOdGzjg4xZu1D87C3MJtdA Content-Type: application/json-patch+json costco-x-wcs-clientId: 4900eb1f-0c10-4bd9-99c3-c59e6c1ecebf client-identifier: 481b1aec-aa3b-454b-b81b-48187e28f205 Content-Length: 808 Origin: https://www.costco.com DNT: 1 Sec-GPC: 1 Connection: keep-alive Referer: https://www.costco.com/ Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-site
Request
Request {"query":"query receiptsWithCounts($startDate: String!, $endDate: String!,$documentType:String!,$documentSubType:String!) {\n receiptsWithCounts(startDate: $startDate, endDate: $endDate,documentType:$documentType,documentSubType:$documentSubType) {\n inWarehouse\n gasStation\n carWash\n gasAndCarWash\n receipts{\n warehouseName receiptType documentType transactionDateTime transactionBarcode warehouseName transactionType total \n totalItemCount\n itemArray { \n itemNumber\n }\n tenderArray { \n tenderTypeCode\n tenderDescription\n amountTender\n }\n couponArray { \n upcnumberCoupon\n } \n }\n}\n }","variables":{"startDate":"1/01/2026","endDate":"3/31/2026","text":"Last 3 Months","documentType":"all","documentSubType":"all"}}
Response
{"data":{"receiptsWithCounts":{"inWarehouse":2,"gasStation":0,"carWash":0,"gasAndCarWash":0,"receipts":[{"warehouseName":"MT VERNON","receiptType":"In-Warehouse","documentType":"WarehouseReceiptDetail","transactionDateTime":"2026-03-12T16:16:00","transactionBarcode":"21111500804012603121616","transactionType":"Sales","total":208.58,"totalItemCount":24,"itemArray":[{"itemNumber":"34779"},{"itemNumber":"7950"},{"itemNumber":"2005"},{"itemNumber":"1941976"},{"itemNumber":"4873222"},{"itemNumber":"374664"},{"itemNumber":"60357"},{"itemNumber":"30669"},{"itemNumber":"1025795"},{"itemNumber":"787876"},{"itemNumber":"22093"},{"itemNumber":"1956177"},{"itemNumber":"1136340"},{"itemNumber":"7609681"},{"itemNumber":"18001"},{"itemNumber":"27003"},{"itemNumber":"1886266"},{"itemNumber":"4102"},{"itemNumber":"87745"},{"itemNumber":"110784"},{"itemNumber":"47492"},{"itemNumber":"2287780"},{"itemNumber":"917546"},{"itemNumber":"1768123"},{"itemNumber":"374558"}],"tenderArray":[{"tenderTypeCode":"061","tenderDescription":"VISA","amountTender":208.58}],"couponArray":[{"upcnumberCoupon":"2100003746641"},{"upcnumberCoupon":"2100003745583"}]},{"warehouseName":"MT VERNON","receiptType":"In-Warehouse","documentType":"WarehouseReceiptDetail","transactionDateTime":"2026-02-14T16:25:00","transactionBarcode":"21111500503322602141625","transactionType":"Sales","total":188.12,"totalItemCount":23,"itemArray":[{"itemNumber":"7812"},{"itemNumber":"7950"},{"itemNumber":"3923"},{"itemNumber":"19813"},{"itemNumber":"87745"},{"itemNumber":"1116038"},{"itemNumber":"5938"},{"itemNumber":"1136340"},{"itemNumber":"30669"},{"itemNumber":"384962"},{"itemNumber":"1331732"},{"itemNumber":"787876"},{"itemNumber":"61576"},{"itemNumber":"110784"},{"itemNumber":"180973"},{"itemNumber":"3"},{"itemNumber":"744361"},{"itemNumber":"1886266"},{"itemNumber":"1025795"},{"itemNumber":"11545"},{"itemNumber":"47492"},{"itemNumber":"260509"}],"tenderArray":[{"tenderTypeCode":"061","tenderDescription":"VISA","amountTender":188.12}],"couponArray":[]}]}}}
item
headers
POST /ebusiness/order/v1/orders/graphql HTTP/2 Host: ecom-api.costco.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0 Accept: / Accept-Language: en-US,en;q=0.9 Accept-Encoding: gzip, deflate, br, zstd costco.service: restOrders costco.env: ecom costco-x-authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IlhrZTFoNXg5TV9ZMk5ER0YxU1hDX2xNNnVSTU5tZTJ3STBLRDlHNzl1QmciLCJ0eXAiOiJKV1QifQ.eyJleHAiOjE3NzM2NjUzODUsIm5iZiI6MTc3MzY2NDQ4NSwidmVyIjoiMS4wIiwiaXNzIjoiaHR0cHM6Ly9zaWduaW4uY29zdGNvLmNvbS9lMDcxNGRkNC03ODRkLTQ2ZDYtYTI3OC0zZTI5NTUzNDgzZWIvdjIuMC8iLCJzdWIiOiIzMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJhdWQiOiJhM2E1MTg2Yi03Yzg5LTRiNGMtOTNhOC1kZDYwNGU5MzA3NTciLCJhY3IiOiJCMkNfMUFfU1NPX1dDU19zaWdudXBfc2lnbmluXzIwMSIsIm5vbmNlIjoiNzg5MjIzOGUtOWU3NC00MzExLWI2NDItMzQ1NTY4ZDY3NTk4IiwiaWF0IjoxNzczNjY0NDg1LCJhdXRoX3RpbWUiOjE3NzM2NjQ0ODQsImF1dGhlbnRpY2F0aW9uU291cmNlIjoibG9jYWxBY2NvdW50QXV0aGVudGljYXRpb24iLCJlbWFpbCI6ImpvaG5tb3Nlc2NhcnRlckBnbWFpbC5jb20iLCJuYW1lIjoiRW1wdHkgRGlzcGxheW5hbWUiLCJ1c2VySWRlbnRpdGllcyI6W3siaXNzdWVyIjoiYTNhNTE4NmItN2M4OS00YjRjLTkzYTgtZGQ2MDRlOTMwNzU3IiwiaXNzdWVyVXNlcklkIjoiQUFEOjMxMjNlZDZjLWM3MzgtNGI5OS05MDBkLTE0MTVlNTM2MDZjZSJ9LHsiaXNzdWVyIjoiNDkwMGViMWYtMGMxMC00YmQ5LTk5YzMtYzU5ZTZjMWVjZWJmIiwiaXNzdWVyVXNlcklkIjoiYTZmZmRkOTktNDM2OC00NTgwLTgxOWYtZTZjZjYxM2U1M2M1In0seyJpc3N1ZXIiOiIyZGQ0YjE0NS0zYmRhLTQ2NjktYWU2YS0zN2I4Y2I2ZGFmN2YiLCJpc3N1ZXJVc2VySWQiOiJhNmZmZGQ5OS00MzY4LTQ1ODAtODE5Zi1lNmNmNjEzZTUzYzUifV0sImlzc3VlclVzZXJJZCI6IkFBRDozMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJjbGllbnRJZCI6ImEzYTUxODZiLTdjODktNGI0Yy05M2E4LWRkNjA0ZTkzMDc1NyIsInJlbWVtYmVyTWUiOiJGYWxzZSIsInNlbmRNZUVtYWlsIjoib2ZmIiwiaXBBZGRyZXNzIjoiOTYuMjQxLjIxMi4xMjUiLCJDb3JyZWxhdGlvbklkIjoiMDk0YTE5NDYtZTMwNS00ZDkzLWEyMzQtM2ZiNGMwMjMyNDhhIn0.FdsVFHsewvpQABvkEz4uA0NUlYwvlBEg-frJbUDIJRTsP59Be0bOt8Zqv6cZhUqBn_lTQEyi9tnvpkpycmNy7Rg5zLfYroH6mNALRqkBm8VbcmrEVDM1HmdNTHgO9vQD4TdKm1ZYkA7Pj_6QY3sDxI4ioOzIz1_XOnoJVAXjEwGfr8hgvqtlaC51M5DsfIGQj3zCaJrQnD6GBJlFmLNUpCulpT16WAaB1lT_pcycfBs-e1xnEd33dX0kHBOZ8pFS-IKjV_44ZK9R8jI9WHx5ThX3-DtyqjkJ0JypmhT9uEa0MeT55U7aeKPbMvQ0exiw3culKgiWDhvdp8e2EkExsg Content-Type: application/json-patch+json costco-x-wcs-clientId: 4900eb1f-0c10-4bd9-99c3-c59e6c1ecebf client-identifier: 481b1aec-aa3b-454b-b81b-48187e28f205 Content-Length: 2916 Origin: https://www.costco.com DNT: 1 Sec-GPC: 1 Connection: keep-alive Referer: https://www.costco.com/ Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-site Priority: u=0 TE: trailers
request
{"query":"query receiptsWithCounts($barcode: String!,$documentType:String!) {\n receiptsWithCounts(barcode: $barcode,documentType:$documentType) {\nreceipts{\n warehouseName\n receiptType \n documentType \n transactionDateTime \n transactionDate \n companyNumber \n warehouseNumber \n operatorNumber \n warehouseName \n warehouseShortName \n registerNumber \n transactionNumber \n transactionType\n transactionBarcode \n total \n warehouseAddress1 \n warehouseAddress2 \n warehouseCity \n warehouseState \n warehouseCountry \n warehousePostalCode\n totalItemCount \n subTotal \n taxes\n total \n invoiceNumber\n sequenceNumber\n itemArray { \n itemNumber \n itemDescription01 \n frenchItemDescription1 \n itemDescription02 \n frenchItemDescription2 \n itemIdentifier \n itemDepartmentNumber\n unit \n amount \n taxFlag \n merchantID \n entryMethod\n transDepartmentNumber\n fuelUnitQuantity\n fuelGradeCode\n fuelUnitQuantity\n itemUnitPriceAmount\n fuelUomCode\n fuelUomDescription\n fuelUomDescriptionFr\n fuelGradeDescription\n fuelGradeDescriptionFr\n\n } \n tenderArray { \n tenderTypeCode\n tenderSubTypeCode\n tenderDescription \n amountTender \n displayAccountNumber \n sequenceNumber \n approvalNumber \n responseCode \n tenderTypeName \n transactionID \n merchantID \n entryMethod\n tenderAcctTxnNumber \n tenderAuthorizationCode \n tenderTypeName\n tenderTypeNameFr\n tenderEntryMethodDescription\n walletType\n walletId\n storedValueBucket\n } \n subTaxes { \n tax1 \n tax2 \n tax3 \n tax4 \n aTaxPercent \n aTaxLegend \n aTaxAmount\n aTaxPrintCode\n aTaxPrintCodeFR \n aTaxIdentifierCode \n bTaxPercent \n bTaxLegend \n bTaxAmount\n bTaxPrintCode\n bTaxPrintCodeFR \n bTaxIdentifierCode \n cTaxPercent \n cTaxLegend \n cTaxAmount\n cTaxIdentifierCode \n dTaxPercent \n dTaxLegend \n dTaxAmount\n dTaxPrintCode\n dTaxPrintCodeFR \n dTaxIdentifierCode\n uTaxLegend\n uTaxAmount\n uTaxableAmount\n } \n instantSavings \n membershipNumber \n }\n }\n }","variables":{"barcode":"21111500804012603121616","documentType":"warehouse"}}
response
{"data":{"receiptsWithCounts":{"receipts":[{"warehouseName":"MT VERNON","receiptType":"In-Warehouse","documentType":"WarehouseReceiptDetail","transactionDateTime":"2026-03-12T16:16:00","transactionDate":"2026-03-12","companyNumber":1,"warehouseNumber":1115,"operatorNumber":43,"warehouseShortName":"MT VERNON","registerNumber":8,"transactionNumber":401,"transactionType":"Sales","transactionBarcode":"21111500804012603121616","total":208.58,"warehouseAddress1":"7940 RICHMOND HWY","warehouseAddress2":null,"warehouseCity":"ALEXANDRIA","warehouseState":"VA","warehouseCountry":"US","warehousePostalCode":"22306","totalItemCount":24,"subTotal":202.01,"taxes":6.57,"invoiceNumber":null,"sequenceNumber":null,"itemArray":[{"itemNumber":"34779","itemDescription01":"ROMANO","frenchItemDescription1":null,"itemDescription02":"CS=15 SL120 T9H6","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":19,"unit":1,"amount":20.93,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":19,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":11.69,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"7950","itemDescription01":"4LB COSMIC","frenchItemDescription1":null,"itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":5.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"2005","itemDescription01":"25# FLOUR","frenchItemDescription1":null,"itemDescription02":"ALL-PURPOSE HARV P98/100","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":9.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1941976","itemDescription01":"BREAD FLOUR","frenchItemDescription1":null,"itemDescription02":"12 LBS 180P 20X9","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":9.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"4873222","itemDescription01":"ALL F&C","frenchItemDescription1":null,"itemDescription02":"200OZ 160LOADS P104","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":14,"unit":1,"amount":19.99,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":14,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":19.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"374664","itemDescription01":"/ 4873222","frenchItemDescription1":"4873222","itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":14,"unit":-1,"amount":-5,"taxFlag":null,"merchantID":null,"entryMethod":null,"transDepartmentNumber":14,"fuelUnitQuantity":null,"fuelGradeCode":null,"itemUnitPriceAmount":0,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"60357","itemDescription01":"MIXED PEPPER","frenchItemDescription1":null,"itemDescription02":"6-PACK","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":7.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":7.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"30669","itemDescription01":"BANANAS","frenchItemDescription1":null,"itemDescription02":"3 LB / 1.36 KG","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":2,"amount":2.98,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":1.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1025795","itemDescription01":"KS 5DZ EGGS","frenchItemDescription1":null,"itemDescription02":"SL21 P120 / P132 / P144","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":17,"unit":1,"amount":9.39,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":17,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.39,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"787876","itemDescription01":"KS TWNY PORT","frenchItemDescription1":null,"itemDescription02":"PORTUGAL CSPC# 773506","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":16,"unit":1,"amount":17.99,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":16,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":17.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"22093","itemDescription01":"KS SHRP CHDR","frenchItemDescription1":null,"itemDescription02":"EC20T9H5 W12T13H5 SL130","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":17,"unit":1,"amount":5.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":17,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1956177","itemDescription01":"BRWNBTTRGRV","frenchItemDescription1":null,"itemDescription02":"MCCORMICK C12T19H7 L228","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":2.97,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":2.97,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1136340","itemDescription01":"3LB ORG GALA","frenchItemDescription1":null,"itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":4.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":4.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"7609681","itemDescription01":"CASCADE GEL","frenchItemDescription1":null,"itemDescription02":"125OZ T60H3P180","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":14,"unit":1,"amount":12.49,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":14,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":12.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"18001","itemDescription01":"TBLE SALT 4#","frenchItemDescription1":null,"itemDescription02":"DIAMOND CRYSTAL P=600","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":1.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":1.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"27003","itemDescription01":"STRAWBERRIES","frenchItemDescription1":null,"itemDescription02":"908 G / 2 LB","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":5.29,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.29,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1886266","itemDescription01":"SKO 5%","frenchItemDescription1":null,"itemDescription02":"48 OZ T10H8 SL30","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":17,"unit":1,"amount":5.79,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":17,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.79,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"4102","itemDescription01":"8\" TORTILLAS","frenchItemDescription1":null,"itemDescription02":"SL10 70OZ","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":5.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"87745","itemDescription01":"ROTISSERIE","frenchItemDescription1":null,"itemDescription02":"USDA GRADE A","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":63,"unit":1,"amount":4.99,"taxFlag":"D","merchantID":null,"entryMethod":null,"transDepartmentNumber":63,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":4.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"110784","itemDescription01":"15 GRAIN BRD","frenchItemDescription1":null,"itemDescription02":"PEPPERIDGE FARM 2/24 OZ","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":5.69,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.69,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"47492","itemDescription01":"CELERY SALAD","frenchItemDescription1":null,"itemDescription02":"APPLE CIDER VINAIGRETTE","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":63,"unit":1,"amount":12.62,"taxFlag":"D","merchantID":null,"entryMethod":null,"transDepartmentNumber":63,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":4.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"2287780","itemDescription01":"BTB CHICKEN","frenchItemDescription1":null,"itemDescription02":"C12T10H9 P1080 SL630","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":9.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"917546","itemDescription01":"JIF CREAMY","frenchItemDescription1":null,"itemDescription02":"PEANUT BUTTER SL540 P300","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":11.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":11.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1768123","itemDescription01":"BBEE KIDS4PC","frenchItemDescription1":null,"itemDescription02":"FY26 P1600 T200 H8","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":39,"unit":1,"amount":17.99,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":39,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":17.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"374558","itemDescription01":" 1768123","frenchItemDescription1":"/1768123","itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":39,"unit":-1,"amount":-4,"taxFlag":null,"merchantID":null,"entryMethod":null,"transDepartmentNumber":39,"fuelUnitQuantity":null,"fuelGradeCode":null,"itemUnitPriceAmount":0,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null}],"tenderArray":[{"tenderTypeCode":"061","tenderSubTypeCode":null,"tenderDescription":"VISA","amountTender":208.58,"displayAccountNumber":"9070","sequenceNumber":null,"approvalNumber":null,"responseCode":null,"tenderTypeName":"VISA","transactionID":null,"merchantID":null,"entryMethod":null,"tenderAcctTxnNumber":null,"tenderAuthorizationCode":null,"tenderTypeNameFr":null,"tenderEntryMethodDescription":null,"walletType":null,"walletId":null,"storedValueBucket":null}],"subTaxes":{"tax1":null,"tax2":null,"tax3":null,"tax4":null,"aTaxPercent":null,"aTaxLegend":"A","aTaxAmount":4.62,"aTaxPrintCode":null,"aTaxPrintCodeFR":null,"aTaxIdentifierCode":null,"bTaxPercent":null,"bTaxLegend":null,"bTaxAmount":null,"bTaxPrintCode":null,"bTaxPrintCodeFR":null,"bTaxIdentifierCode":null,"cTaxPercent":null,"cTaxLegend":"C","cTaxAmount":1.25,"cTaxIdentifierCode":null,"dTaxPercent":null,"dTaxLegend":"D","dTaxAmount":0.7,"dTaxPrintCode":null,"dTaxPrintCodeFR":null,"dTaxIdentifierCode":null,"uTaxLegend":null,"uTaxAmount":null,"uTaxableAmount":null},"instantSavings":9,"membershipNumber":"111894291684"}]}}}
online order
appears to be same endpoint. request: {"query":"query getOnlineOrders($startDate:String!, $endDate:String!, $pageNumber:Int , $pageSize:Int, $warehouseNumber:String! ){\n getOnlineOrders(startDate:$startDate, endDate:$endDate, pageNumber : $pageNumber, pageSize : $pageSize, warehouseNumber : $warehouseNumber) {\n pageNumber\n pageSize\n totalNumberOfRecords\n bcOrders {\n orderHeaderId\n orderPlacedDate : orderedDate\n orderNumber : sourceOrderNumber \n orderTotal\n warehouseNumber\n status\n emailAddress\n orderCancelAllowed\n orderPaymentFailed : orderPaymentEditAllowed\n orderReturnAllowed\n orderLineItems {\n orderLineItemCancelAllowed\n →rderLineItemId\n orderReturnAllowed\n itemId\n itemNumber\n itemTypeId\n lineNumber\n itemDescription\n deliveryDate\n warehouseNumber\n status\n orderStatus\n parentOrderLineItemId\n isFSAEligible\n shippingType\n shippingTimeFrame\n isShipToWarehouse\n carrierItemCategory\n carrierContactPhone\n programTypeId\n isBuyAgainEligible\n scheduledDeliveryDate\n scheduledDeliveryDateEnd\n configuredItemData\n shipment {\n shipmentId \n orderHeaderId\n orderShipToId \n lineNumber \n orderNumber\n shippingType \n shippingTimeFrame \n shippedDate \n packageNumber \n trackingNumber \n trackingSiteUrl \n carrierName \n estimatedArrivalDate \n deliveredDate \n isDeliveryDelayed \n isEstimatedArrivalDateEligible \n statusTypeId \n status \n pickUpReadyDate\n pickUpCompletedDate\n reasonCode\n trackingEvent {\n event\n carrierName\n eventDate\n estimatedDeliveryDate\n scheduledDeliveryDate\n trackingNumber\n }\n }\n }\n }\n }\n }","variables":{"pageNumber":1,"pageSize":10,"startDate":"2026-1-01","endDate":"2026-3-31","warehouseNumber":"847"}}
need new scope changes:
- pull all orders by default
- add online orders
- copy header data from browser using selenium
how to run
python scrape_giant.py python enrich_giant.py python scrape_costco.py python enrich_costco.py python build_observed_products.py python build_review_queue.py python build_canonical_layer.py python validate_cross_retailer_flow.py
t1.13 tasks [2026-03-17 Tue 13:49]
ok i ran a few. time to run some cleanups here - i'm wondering if we shouldn't be less aggressive with canonical names and encourage a better manual process to start.
TODO fill in auto-created canonical category, product-type
auto-created canonical_names lack category, product_type - ok with filling these in manually in the catalog once the queue is empty
TODO consolidation cleanup
- canonical_names feel too specific, e.g., "5DZ egg" - probably a problem with the enrich_* steps not adding appropraite normalizing data and removing from observed product title?
-
some canonical_names need consolidation, eg "LIME" and "LIME . / ." ; poss cleanup issue. there are 5 entries for ergg but but they are all regular large grade A white eggs, just different amounts in dozens. Eggs are actually a great candidate for the kind of analysis we want to do - the pipeline should have caught and properly sorted these into size/qty:
```canonical_product_id canonical_name category product_type brand variant size_value size_unit pack_qty measure_type notes created_at updated_at gcan_0e350505fd22 5DZ EGG / / KS each auto-linked via exact_name gcan_47279a80f5f3 EGG 5 DOZ. BBS each auto-linked via exact_name gcan_7d099130c1bf LRG WHITE EGG SB 30 count auto-linked via exact_upc gcan_849c2817e667 GDA LRG WHITE EGG SB 18 count auto-linked via exact_upc gcan_cb0c6c8cf480 LG EGG CONVENTIONAL 18 count count auto-linked via exact_name_size ```
TODO costco discount matching
Build costco mechanism for matching discount to line item.
- Discounts appear as their own line items with a number like /123456, this matches the UPC of the discounted item
- must be date-matched to the UPC
Data model might be missing shape:
- match discount rows like `item_name:/2303476` to `retailer_item_id:2303476`
- display this value on the item somehow? maybe update line_total? otherwise we lose fidelity. should be stored in items_enriched somehow
```retailer order_id line_no observed_item_key order_date retailer_item_id pod_id item_name upc category_id category qty unit unit_price line_total picked_weight mvp_savings reward_savings coupon_savings coupon_price image_url raw_order_path item_name_norm brand_guess variant size_value size_unit pack_qty measure_type is_store_brand is_fee is_discount_line is_coupon_line price_per_each price_per_lb price_per_oz parse_version parse_notes costco 2.11115E+22 3 costco:21111520101942404241753:3 4/24/2024 2303476 KA 6QT MIXER P16 KSM60SECXER/CU FY23 33 33 1 None 399.99 399.99 costco_output/raw/21111520101942404241753-2024-04-24T17-53-00.json KA 6QT MIXER KSM60SECXER/CU each FALSE FALSE FALSE FALSE 399.99 costco-enrich-v1 costco 2.11115E+22 4 costco:21111520101942404241753:4 4/24/2024 325173 /2303476 33 33 -1 None 0 -100 -100 costco_output/raw/21111520101942404241753-2024-04-24T17-53-00.json /2303476 each FALSE FALSE TRUE TRUE 100 costco-enrich-v1 ```
TODO giant discount matching
prompt
do not add new abstractions unless they remove real duplication. prefer explicit retailer-specific logic over generic heuristics. do not auto-create new canonical products from weak normalized names. and propose the smallest set of edits needed.
1.13 fixes
15x Costco discounts not caught
- 15x, some with slash-space: `/ 1768123`and some without: `/2303476`
canonical names suck - tempted to force manual config from scratch?
- maybe first-pass should be naming groups, starting with largest groups and going on down.
- unfortunately not seeing many cross-retailer items? looks like costco-only; just taking Giant as gospel
- could be as simple as changing canonical name in canonical_catalog.csv
- tough to figure out where the data is, leading to below:
need to refactor whole flow and where data is stored
group by browser or by site, or both? currently mixed.
-
Scrape
- Script:
- Output: /output/raw/orderN.json, history.json, orders.csv, history.csv
-
Enrich
- Scripts:
- Output: /output/enrich/items.json
-
Combined - output?
- Review step?
propsed fixes
1.14 prep - OBE
[ ] t1.14.1 define and document the filesystem/data-layer layout (2-3 commits)
make stage ownership and retailer ownership explicit so every artifact has one obvious home
AC
- define and document the canonical directory layout for the pipeline, separating retailer-specific artifacts from shared combined artifacts
-
adopt an explicit layout of the form:
- `data/<retailer>/raw/`
- `data/<retailer>/orders.csv`
- `data/<retailer>/items.csv`
- `data/<retailer>/items_enriched.csv`
- `data/combined/products_observed.csv`
- `data/combined/review_queue.csv`
- `data/combined/item_aliases.csv`
- `data/combined/canonical_catalog.csv`
- `data/combined/product_links.csv`
- `data/combined/purchases.csv`
- `data/combined/pipeline_status.csv`
- `data/combined/pipeline_status.json`
- update docs/readme and pipeline docs so each script’s inputs and outputs point to the new layout
- remove or deprecate ambiguous stage outputs living under a retailer-specific output directory when they are actually shared artifacts
- pm note: goal is “where does this file live?” should have one answer, not three
evidence
- commit:
- tests:
- date:
notes
[ ] t1.14.2 define the row-level data model for raw, enriched, observed, canonical, and purchases layers (2-4 commits)
lock the item model before further refactors so each stage has a clear grain and purpose
AC
-
document the row grain for each layer:
- raw item row = one receipt line from one retailer order
- enriched item row = one retailer line with retailer-specific parsed fields
- observed product row = one grouped retailer-facing product concept
- canonical catalog row = one review-controlled product identity
- purchase row = one final pivot-ready purchased item line
- define the required fields for each layer, including stable ids and provenance fields
- explicitly document which fields are allowed to be blank at each layer (e.g. `upc`, `canonical_item_id`, category)
-
document the relationship between:
- `raw_item_name`
- `normalized_item_name`
- `observed_product_id`
- `canonical_item_id`
- document how retailer-native ids (e.g. Costco `retailer_item_id`) fit into the shared model without being forced into `upc`
- pm note: this is the schema contract task; code should follow it, not invent it ad hoc
evidence
- commit:
- tests:
- date:
notes
[ ] t1.14.3 refactor pipeline outputs to the new layout without changing semantics (2-4 commits)
move files and script defaults to the new structure while preserving current behavior
AC
- update scraper and enrich scripts to write retailer-specific outputs under `data/<retailer>/…`
- update combined/shared scripts to read from retailer-specific enriched outputs and write to `data/combined/…`
- preserve current content/meaning of outputs during the move; this is a location/structure refactor, not a behavior rewrite
- update tests, docs, and script defaults to use the new paths
- pm note: do not mix data-layout cleanup with canonical/review logic changes in this task
evidence
- commit:
- tests:
- date:
notes
[ ] t1.14.4 make the review and catalog layer explicit and authoritative (2-4 commits)
treat review and canonical resolution as first-class data, not incidental byproducts
AC
- define `review_queue.csv`, `item_aliases.csv`, and `canonical_catalog.csv` as the authoritative review/catalog files in `data/combined/`
-
document the intended purpose of each:
- `review_queue.csv` = unresolved observed items needing action
- `item_aliases.csv` = approved mapping from observed/normalized names to canonical ids
- `canonical_catalog.csv` = review-controlled canonical product definitions and display names
- ensure final purchase generation reads from these files as the source of truth for resolution
- stop relying on weak implicit canonical creation as a substitute for the explicit review/catalog layer
- pm note: this is the control-plane task; observed products may be automatic, canonical products are review-controlled
evidence
- commit:
- tests:
- date:
notes
[ ] t1.14.5 define and document the final pivot-ready purchases output (2-3 commits)
make the final analysis artifact explicit so excel/pivot/chart use is a first-class target
AC
- define `data/combined/purchases.csv` as the final normalized purchase log
-
ensure each purchase row retains:
- purchase date
- retailer
- order id
- raw item name
- normalized item name
- canonical item id when resolved
- quantity and unit
- original line total
- discount-adjusted fields when applicable
- store/location fields where available
- document that `purchases.csv` is the primary excel/pivot input and that earlier files are staging layers
- document expected pivot uses such as purchase frequency and cost over time by canonical item
- pm note: this task is about making the final artifact explicit and stable, not about adding new metrics
evidence
- commit:
- tests:
- date:
notes
pipeline prep [2026-03-17 Tue]
data saved to /data
-
"scrape_<retailer>" gathers data from a retailer and outputs:
- raw list of items per visit ./<retailer>/scraped/raw/order-<uid>.json
- raw list of visits ./<retailer>/scraped_visits.csv
- raw list of items from all visits ./<retailer>/scraped_items.csv
-
"enrich <retailer>" takes scraped data and outputs:
- normalized list of items ./<retailer>/enriched_items.csv
- "combine" takes retailer
input:
- all enriched items ./<retailer>/enriched_items.csv
- all retailer visits ./<retailer>/scraped_visits.csv
outputs:
- observed product groups ./combined/observed/products_observed.csv
- unresolved products for review ./combined/review/review_queue.csv
- pipeline accounting/status ./combined/status/pipeline_status.csv
- pipeline accounting/status ./combined/status/pipeline_status.json
-
review resolves unknown or weakly identified products and maintains:
- canonical product catalog ./combined/review/canonical_catalog.csv
- approved alias mappings ./combined/review/item_aliases.csv
- optional observed→canonical links ./combined/review/product_links.csv
- build purchases takes combined observed data plus review/catalog data and outputs: [1]. final normalized purchase log ./combined/purchases/purchases.csv
lets get this pipeline right before more refactoring.
Pipeline - moved to data-model.org [2026-03-18 Wed]
Key:
- (1) input
- [2] output
Each step can be run alone if its dependents exist.
1. Collect
Get raw receipt/visit and item data from a retailer. Scraping is unique to a Retailer and method (e.g., Giant-Web and Giant-Scan). Preserve complete raw data and preserve fidelity. Avoid interpretation beyond basic data flattening.
- (1) Source access (Varies, eg header data, auth for API access)
- [1] collected visits from each retailer
- [2] collected items from each retailer
- [3] any other raw data that supports [1] and [2]; explicit source (eventual receipt scan?)
2. Normalize
Parse and extract structured facts from retailer-specific raw data to create a standardized item format. Strictly dependent on Collect method and output.
- Extract quantity, size, pack, pricing, variant
- Consolidate discount with item using upc/retail_item_id and concurrence
- Cleanup naming to facilitate later matching
- (1) collected items from each retailer
- (2) collected visits from each retailer
- [1] normalized items from each retailer
3. Review/Combine (Canonicalization)
Decide whether two normalized retailer items are "the same product"; match items across retailers using algo/logic and human review. Create catalog linked to normalized items.
- Grouping the same item from retailer
-
Asking human to create a canonical/catalog item with:
- friendly/canonical_name: "bell pepper"; "milk"
- category: "produce"; "dairy"
- product_type: "pepper"; "milk"
- ? variant? "whole, "skim", "2pct"
- (1) normalized items from each retailer
- [1] review queue of items to be reviewed
- [2] catalog (lookup table) of confirmed retailer_item and canonical_name
- [3] canonical purchase list, pivot-ready
Unresolved Issues
- Create tags: canonical_name (need better label), category, product_type is missing data like Variant, shouldn't this be part of the normalization step?
- need central script to orchestrate; metadata belongs here and nowhere else
Symptoms
-
`LIME` and `LIME . / .` appearing in canonical_catalog:
- names must come from review-approved names, not raw strings
notes
to fix
- options not reading/sticking?
- ice cream - add flavor, call it frozen (not dairy)
- seltzer/soda from "seltzer,soda,bev" to "cherry san pellegrino, seltzer, bev"?
- [1] chicken bouillon, soup, (0 items, 0 rows) -> chicken bouillon, broth?, ,
- peanut butter,, -> creamy peanut butter, peanut butter, condiment
- add gummy bear to candy
- add "fresh" to fresh strawberry
- fix "onion,veg,produce"
manage product_type and category directly? future: fix match
Done
fuji apple, apple, produce (not apple, fruit, produce) spinach, , produce -> frozen vs fresh? frozen chicken thighs -> rotisserie chicken, chicken, poultry -> rotisserie chicken, chicken, meat beef patty, hamburger, meat -> hamburger patty, beef, meat oats > cereal cheerios > cereal
- 3 kinds of greek yogurt!!
takeaways
- variants not caught, how to fix?
catalog_name = what you actually bought product_type = reasonable substitute category = store aisle
Using different categories maintains a direct comparison (product_type==spinach) and a distinction. fresh spinach, spinach, produce frozen spinach, spinach, frozen
include in catalog_name:
- form: frozen, fresh, ground, shredded
- fat level: whole, skim, 2%
- flavor when primary: vanilla yogurt vs plain yogurt
- cut: diced tomatoes vs crushed tomatoes
- species when relevant: gala apple vs fuji apple
exclude from catalog_name:
- package size / multipack count
- promo wording; adjectives like "premium"; retailer marketing fluff
AC
-
fix internal search flow, add same menu
Review 4/345: SHRP CHDR 5 matched items: [1] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2026-03-12 | 5.49 | [2] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2025-01-24 | 12.58 | [3] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2025-01-10 | 6.29 | [4] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2024-12-14 | 6.29 | [5] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2024-08-06 | 5.99 | no catalog_name suggestions found [f]ind [n]ew [s]kip e[x]clude [q]uit > f search: cheddar 1 search results found: [1] cheddar cheese, cheese, dairy (0 items, 0 rows) - selection: 1 + [#] link to suggestion [f]ind [n]ew [s]kip e[x]clude [q]uit >
instead of
search: banana
no matches found
- search again? [enter=yes, q=no]:
+ [f]ind [n]ew [s]kip e[x]clude [q]uit >
-
during a long review session, two pepper or onion types back-to-back cant see the one i just added
- suggest just-added catalog items
- script likely needs to re-read the csv, not just add
3. suggest based on both catalog & product_name (this is already happening
- Search results do not properly list running totals: 5 search results found:
[1] red onion, onion, produce (0 items, 0 rows) [2] mild roasted red bell pepper, bell pepper, produce (0 items, 0 rows) [3] onion, vegetable, produce (0 items, 0 rows) [4] sour cream and onion potato chip, chips, snack (0 items, 0 rows) [5] yellow onion, onion, produce (0 items, 0 rows) selection:
data cleanup [2026-03-23 Mon]
ok we're getting closer. still see some issues
- reorder purchases columns for display: catalog_name, product_type, category (makes data/troubleshooting way easier)
- shouldn't net_line_price should never be empty? to allow cumulative cost comparison/analysis (we can see normalized price per X via effective_price but shouldnt this be weighted against how much we bought? eg if we bought 5lb flour at $0.970/lb this is weighted as 1-to-1 with a 25lb purchase as 0.670/lb
- some items missing entire categorizations? probably a result of me trying to do data cleanup. i found the orphaned values in teh product_links table and removed them, but re-running review_products.py did not catch this… shouldn't review_products run a comparison between each vendor's normalized_items and compare to the existing review_queu? RSET POTATO US 1 GREEK YOGURT DOM55 FDLY CHY VAN IC CRM DUNKIN DONUT CANISTER ORIG BLND P=260 ICE CUBES BLACK BEANS KETCHUP SQUEEZE BTL YELLOW_GOLD POTATO US 1 YELLOW_GOLD POTATO US 1 PINTO BEANS
- cleanup deprecated .py files
-
Goals:
-
When have I purchased this item, what did I pay, and how has the price changed over time?
- we're close, but missing units - eg AP flour shows a value that looks like price/lb but you just see $0.765
- doesnt seem like we've captured everything but that's just a gut feeling
- Visit breakdown as well as catalog/product/category? this certainly belongs in purchases.csv.
- Consider dash/plotly for better-than-excel tracking, since we're really only looking at a couple of graphs and filtering within certain values? (obv keep purchases as a user-friendly output)
-
1. Cleanup purchases column order
purchase_date retailer catalog_name product_type category net_line_total normalized_quantity effective_price effective_price_unit (new) order_id line_no raw_item_name normalized_item_name catalog_id normalized_item_id
2. Populate and use purchases.net_line_total
net_line_total = line_total+matched_discount_amoun effective_price = net_line_total / normalized_quantity weighted cost analysis uses net_line_total, not just avg effective_price
3. Improve review robustness, enable norm_item re review
- should regenerate candidates from:
- normalized items with no valid catalog_id
- normalized items whose linked catalog_id no longer exists
- normalized items whose linked catalog row exists but missing required fields if you want completeness review
- review_products.py should compare:
- current normalized universe
- current product_links
- current catalog
- current review_queue
4. Remove deprecated.py
5. Improve Charts
- Histogram: add effective_price_unit to purchases.py
-
Visits: plot by order_id enable display of:
- spend by visit
- items per visit
- category spend by visit
- retailer/store breakdown