Files
scrape-giant/pm/notes.org

49 KiB
Raw Blame History

python setup

venv install playwright, pandas playwright install

  1. scrape - raw giant json
  2. enrich - cols:

item_name_norm brand_guess size_value size_unit pack_qty variant is_store_brand is_fee measure_type price_per_lb price_per_oz price_per_each image_url

normalize abbreviationsta extract size like 12z, 10ct, 5lb detect fees like bag charges infer whether something is sold by each vs weight carry forward image url

  1. build observed-product atble from enriched items

git issues

  • dont try to git push from win emacs viewing wsl, it will be screwy (windows identity vs wsl)

ssh / access to gitea

ssh://git@192.168.1.207:2020/ben/scrape-giant.git https://git.hgsky.me/ben/scrape-giant.git

add Port to config: Host gitea HostName 192.168.1.207 User git Port 2020 IdentityFile ~/.ssh/gitea IdentitiesOnly yes

then git remote set-url gitea git@gitea:ben/scrape-giant.git

on local network: use ssh to 192.168.1.207:2020 from elsewhere/public: use https to git.hgsky.me/… unless you later expose ssh properly

stash

z z to stash local work only take care not to add ignored files which will add the venv and `__pycache__`

z p to pop the stash back

creating remote branches

P p, magit will suggest upstream (gitea), select and Enter and it will be created

cherry-picking

b b : switch to desired branch (review) l B : open reflog for local branches (my changes were committed to local cx but not pushed to gitea/cx) put point on the commit you want; did this in sequence A A : cherry pick commit to current branch minibuffer will show the commit and all branches, leave it on that commit the final commit was not shown by hash, just the branch cx since (local) cx was caught up with that branch

reverting a branch

b l : switch to local branch (cx) l l : open local reflog put point on the commit; highlighted remote gitea/cx X : reset branch; prompts you, selected cx

merge branch

b b : switch to branch to be merged into (cx) m m : pick branch to merge into current branch

giant requests

item:

get: /api/v6.0/user/369513017/order/history/detail/69a2e44a16be1142e74ad3cc

headers: request: GET /api/v6.0/user/369513017/order/history/detail/69a2e44a16be1142e74ad3cc?isInStore=true HTTP/2 Host: giantfood.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0 Accept: application/json, text/plain, / Accept-Language: en-US,en;q=0.9 Accept-Encoding: gzip, deflate, br, zstd DNT: 1 Sec-GPC: 1 Connection: keep-alive Referer: https://giantfood.com/account/history/invoice/in-store Cookie: datadome=rDtvd3J2hO5AeghJMSFRRxGc6ifKCQYgMLcqPNr9rWiz2rdcXb032AY6GIZn8tUmYB96BKKbzh3_jSjEzYWLj8hDjl3oGYYAiu4jwdaxpf3vh2v4f7KH7kbqgsMWpkjt; cf_clearance=WEPyQokx9f0qoyS4Svsw4EkZ1TYOxjOwcUHspT3.rXw-1773348940-1.2.1.1-fPvERGxBlFUaBW83sUppbUWpwvFG7mZivag5vBvZb3kxUQv2WSVIV1tON0HV2n8bkVY0U8_BBl62a00Np.oJylYQcGME540gZlYEoL.gMs4WynLqApFe5BOXAEwOm01_6h6b62H90bl4ypRehVb_TXEi4qHaPLVSZhjZK_h.fv6RBqjgYch2j_8XnHe5HXvLziVjl1k2aJskozqy04KOyeHyc3OyIPTZd5On_KAzFIM; dvrctk=MnjKJVShVraEtbrBkkxWxLaZrXnIGNQlwB7QtZVPFeA=; __cflb=0H28vXMLFyydRmDMNgcPHijM6auXkCspCkuh58tVuJ3; __cf_bm=C6QbqiEvbbwdrYBpoJOkcWcedf60vcOfPfTPPbZzKbM-1773348202-1.0.1.1-cSHoYwi8ZjIHTdBItXQP_iXJdRJS6FYjFsGdl1eGHvS5pgfbcT4Lg19P6UStX.bZz1u0OXiS5ykdipPBtwP6OvZr68k4XSmjYpir05jNLhw; _dd_s=rum=0&expire=1773349846445; ppdtk=Uog72CR22mD85C7U4iZHlgOQeRmvHEYp0OdQc+0lEes1c5/LeqGT+ZUlXpSC6FpW; cartId=3820547 Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin Priority: u=0 TE: trailers

response: HTTP/2 200 date: Thu, 12 Mar 2026 20:55:47 GMT content-type: application/json server: cloudflare cf-ray: 9db5b3a5d84aff28-IAD cf-cache-status: DYNAMIC content-encoding: gzip set-cookie: datadome=MXMri0hss6PlQ0_oS7gG2iMdOKnNkbDmGvOxelgN~nCcupgkJQOqjcjcgdprIaI7hSlt_w8E9Ri_RAzPFrGqtUfqAJ_szB_aNZ2FdC26qmI3870Nn4~T0vtx8Gj3dEZR; Max-Age=31536000; Domain=.giantfood.com; Path=/; Secure; SameSite=Lax strict-transport-security: max-age=31536000; includeSubDomains vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers, accept-encoding accept-ch: Sec-CH-UA,Sec-CH-UA-Mobile,Sec-CH-UA-Platform,Sec-CH-UA-Arch,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-Device-Memory x-datadome: protected request-context: appId=cid-v1:75750625-0c81-4f08-9f5d-ce4f73198e54 X-Firefox-Spdy: h2

history:

GET https://giantfood.com/api/v6.0/user/369513017/order/history?filter=instore&loyaltyNumber=440155630880

headers: request: GET /api/v6.0/user/369513017/order/history?filter=instore&loyaltyNumber=440155630880 HTTP/2 Host: giantfood.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0 Accept: application/json, text/plain, / Accept-Language: en-US,en;q=0.9 Accept-Encoding: gzip, deflate, br, zstd DNT: 1 Sec-GPC: 1 Connection: keep-alive Referer: https://giantfood.com/account/history/invoice/in-store Cookie: datadome=OH2XjtCoI6XjE3Qsz_b0F1YULKLatAC0Ea~VMeDGBP0N9Z~CeI3RqEbvkGmNW_VCOU~vRb6p0kqibvF2tLbWnzyAGIdO7jsC41KiYbp7USpJDnefZhIg0e1ypAugvDSw; cf_clearance=WEPyQokx9f0qoyS4Svsw4EkZ1TYOxjOwcUHspT3.rXw-1773348940-1.2.1.1-fPvERGxBlFUaBW83sUppbUWpwvFG7mZivag5vBvZb3kxUQv2WSVIV1tON0HV2n8bkVY0U8_BBl62a00Np.oJylYQcGME540gZlYEoL.gMs4WynLqApFe5BOXAEwOm01_6h6b62H90bl4ypRehVb_TXEi4qHaPLVSZhjZK_h.fv6RBqjgYch2j_8XnHe5HXvLziVjl1k2aJskozqy04KOyeHyc3OyIPTZd5On_KAzFIM; dvrctk=MnjKJVShVraEtbrBkkxWxLaZrXnIGNQlwB7QtZVPFeA=; __cflb=0H28vXMLFyydRmDMNgcPHijM6auXkCspCkuh58tVuJ3; __cf_bm=C6QbqiEvbbwdrYBpoJOkcWcedf60vcOfPfTPPbZzKbM-1773348202-1.0.1.1-cSHoYwi8ZjIHTdBItXQP_iXJdRJS6FYjFsGdl1eGHvS5pgfbcT4Lg19P6UStX.bZz1u0OXiS5ykdipPBtwP6OvZr68k4XSmjYpir05jNLhw; _dd_s=rum=0&expire=1773349842848; ppdtk=Uog72CR22mD85C7U4iZHlgOQeRmvHEYp0OdQc+0lEes1c5/LeqGT+ZUlXpSC6FpW; cartId=3820547 Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin Priority: u=0 TE: trailers

response: HTTP/2 200 date: Thu, 12 Mar 2026 20:55:43 GMT content-type: application/json server: cloudflare cf-ray: 9db5b38f7eebff28-IAD cf-cache-status: DYNAMIC content-encoding: gzip set-cookie: datadome=rDtvd3J2hO5AeghJMSFRRxGc6ifKCQYgMLcqPNr9rWiz2rdcXb032AY6GIZn8tUmYB96BKKbzh3_jSjEzYWLj8hDjl3oGYYAiu4jwdaxpf3vh2v4f7KH7kbqgsMWpkjt; Max-Age=31536000; Domain=.giantfood.com; Path=/; Secure; SameSite=Lax strict-transport-security: max-age=31536000; includeSubDomains vary: Origin, Access-Control-Request-Method, Access-Control-Request-Headers, accept-encoding accept-ch: Sec-CH-UA,Sec-CH-UA-Mobile,Sec-CH-UA-Platform,Sec-CH-UA-Arch,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-Device-Memory x-datadome: protected request-context: appId=cid-v1:75750625-0c81-4f08-9f5d-ce4f73198e54 X-Firefox-Spdy: h2

costco requests

  • localstorage idToken has the auth token, but needs "Bearer " prepended
  • localstorage clientID has the COSTCO_X_WCS_CLIENTID
  • I don't see the client_identifier uuid anywhere.

we will pull from .env first (may have to hardcode) then overwrite with session data (token) hopefully this doesnt change.

warehouse

Headers

POST /ebusiness/order/v1/orders/graphql HTTP/1.1 Host: ecom-api.costco.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0 Accept: / Accept-Language: en-US,en;q=0.9 Accept-Encoding: gzip, deflate, br, zstd costco.service: restOrders costco.env: ecom costco-x-authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IlhrZTFoNXg5TV9ZMk5ER0YxU1hDX2xNNnVSTU5tZTJ3STBLRDlHNzl1QmciLCJ0eXAiOiJKV1QifQ.eyJleHAiOjE3NzM2NjU2NjgsIm5iZiI6MTc3MzY2NDc2OCwidmVyIjoiMS4wIiwiaXNzIjoiaHR0cHM6Ly9zaWduaW4uY29zdGNvLmNvbS9lMDcxNGRkNC03ODRkLTQ2ZDYtYTI3OC0zZTI5NTUzNDgzZWIvdjIuMC8iLCJzdWIiOiIzMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJhdWQiOiJhM2E1MTg2Yi03Yzg5LTRiNGMtOTNhOC1kZDYwNGU5MzA3NTciLCJhY3IiOiJCMkNfMUFfU1NPX1dDU19zaWdudXBfc2lnbmluXzIwMSIsIm5vbmNlIjoiNDA4NjU3YmItODg5MC00MTk0LTg2OTctZDYzOGU2MzdhMGRhIiwiaWF0IjoxNzczNjY0NzY4LCJhdXRoX3RpbWUiOjE3NzM2NjQ3NjgsImF1dGhlbnRpY2F0aW9uU291cmNlIjoibG9jYWxBY2NvdW50QXV0aGVudGljYXRpb24iLCJlbWFpbCI6ImpvaG5tb3Nlc2NhcnRlckBnbWFpbC5jb20iLCJuYW1lIjoiRW1wdHkgRGlzcGxheW5hbWUiLCJ1c2VySWRlbnRpdGllcyI6W3siaXNzdWVyIjoiYTNhNTE4NmItN2M4OS00YjRjLTkzYTgtZGQ2MDRlOTMwNzU3IiwiaXNzdWVyVXNlcklkIjoiQUFEOjMxMjNlZDZjLWM3MzgtNGI5OS05MDBkLTE0MTVlNTM2MDZjZSJ9LHsiaXNzdWVyIjoiNDkwMGViMWYtMGMxMC00YmQ5LTk5YzMtYzU5ZTZjMWVjZWJmIiwiaXNzdWVyVXNlcklkIjoiYTZmZmRkOTktNDM2OC00NTgwLTgxOWYtZTZjZjYxM2U1M2M1In0seyJpc3N1ZXIiOiIyZGQ0YjE0NS0zYmRhLTQ2NjktYWU2YS0zN2I4Y2I2ZGFmN2YiLCJpc3N1ZXJVc2VySWQiOiJhNmZmZGQ5OS00MzY4LTQ1ODAtODE5Zi1lNmNmNjEzZTUzYzUifV0sImlzc3VlclVzZXJJZCI6IkFBRDozMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJjbGllbnRJZCI6ImEzYTUxODZiLTdjODktNGI0Yy05M2E4LWRkNjA0ZTkzMDc1NyIsInJlbWVtYmVyTWUiOiJGYWxzZSIsInNlbmRNZUVtYWlsIjoib2ZmIiwiaXBBZGRyZXNzIjoiOTYuMjQxLjIxMi4xMjUiLCJDb3JyZWxhdGlvbklkIjoiYWUyYTMxYjktMjBkNC00MTBkLWE1ZjAtNDJhMWIzM2VmZmQ1In0.gmhhNsgFUbd0QAR1Z_isFjglQxZrM0Kj8yv5-w-FrsWM3d9PB6kWsldBndy6cEhwZh588T1u4vgG9A-XR3HZ4t-JnPZhpr8_7-lI4W4Tp4IAA0tIgMt7cHZUN14qstx_K72QLOrKbO34PQJKBymw2qKvwvhUo372MNFtc2D8_wS_VbG8QdOPumgsBJPqyF7HExt-gpkAu_5kL-54pqLSIZIJZ_viymti9ajla_B8PlvHMO7ZDWSgoV177ArcQAeOhv9MT1e5k0a4V7R-cCI77NIhoBUjV8C4lMAd27nntWzJJ9N00hEEGQb3zPoWUgRFAOdGzjg4xZu1D87C3MJtdA Content-Type: application/json-patch+json costco-x-wcs-clientId: 4900eb1f-0c10-4bd9-99c3-c59e6c1ecebf client-identifier: 481b1aec-aa3b-454b-b81b-48187e28f205 Content-Length: 808 Origin: https://www.costco.com DNT: 1 Sec-GPC: 1 Connection: keep-alive Referer: https://www.costco.com/ Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-site

Request

Request {"query":"query receiptsWithCounts($startDate: String!, $endDate: String!,$documentType:String!,$documentSubType:String!) {\n receiptsWithCounts(startDate: $startDate, endDate: $endDate,documentType:$documentType,documentSubType:$documentSubType) {\n inWarehouse\n gasStation\n carWash\n gasAndCarWash\n receipts{\n warehouseName receiptType documentType transactionDateTime transactionBarcode warehouseName transactionType total \n totalItemCount\n itemArray { \n itemNumber\n }\n tenderArray { \n tenderTypeCode\n tenderDescription\n amountTender\n }\n couponArray { \n upcnumberCoupon\n } \n }\n}\n }","variables":{"startDate":"1/01/2026","endDate":"3/31/2026","text":"Last 3 Months","documentType":"all","documentSubType":"all"}}

Response

{"data":{"receiptsWithCounts":{"inWarehouse":2,"gasStation":0,"carWash":0,"gasAndCarWash":0,"receipts":[{"warehouseName":"MT VERNON","receiptType":"In-Warehouse","documentType":"WarehouseReceiptDetail","transactionDateTime":"2026-03-12T16:16:00","transactionBarcode":"21111500804012603121616","transactionType":"Sales","total":208.58,"totalItemCount":24,"itemArray":[{"itemNumber":"34779"},{"itemNumber":"7950"},{"itemNumber":"2005"},{"itemNumber":"1941976"},{"itemNumber":"4873222"},{"itemNumber":"374664"},{"itemNumber":"60357"},{"itemNumber":"30669"},{"itemNumber":"1025795"},{"itemNumber":"787876"},{"itemNumber":"22093"},{"itemNumber":"1956177"},{"itemNumber":"1136340"},{"itemNumber":"7609681"},{"itemNumber":"18001"},{"itemNumber":"27003"},{"itemNumber":"1886266"},{"itemNumber":"4102"},{"itemNumber":"87745"},{"itemNumber":"110784"},{"itemNumber":"47492"},{"itemNumber":"2287780"},{"itemNumber":"917546"},{"itemNumber":"1768123"},{"itemNumber":"374558"}],"tenderArray":[{"tenderTypeCode":"061","tenderDescription":"VISA","amountTender":208.58}],"couponArray":[{"upcnumberCoupon":"2100003746641"},{"upcnumberCoupon":"2100003745583"}]},{"warehouseName":"MT VERNON","receiptType":"In-Warehouse","documentType":"WarehouseReceiptDetail","transactionDateTime":"2026-02-14T16:25:00","transactionBarcode":"21111500503322602141625","transactionType":"Sales","total":188.12,"totalItemCount":23,"itemArray":[{"itemNumber":"7812"},{"itemNumber":"7950"},{"itemNumber":"3923"},{"itemNumber":"19813"},{"itemNumber":"87745"},{"itemNumber":"1116038"},{"itemNumber":"5938"},{"itemNumber":"1136340"},{"itemNumber":"30669"},{"itemNumber":"384962"},{"itemNumber":"1331732"},{"itemNumber":"787876"},{"itemNumber":"61576"},{"itemNumber":"110784"},{"itemNumber":"180973"},{"itemNumber":"3"},{"itemNumber":"744361"},{"itemNumber":"1886266"},{"itemNumber":"1025795"},{"itemNumber":"11545"},{"itemNumber":"47492"},{"itemNumber":"260509"}],"tenderArray":[{"tenderTypeCode":"061","tenderDescription":"VISA","amountTender":188.12}],"couponArray":[]}]}}}

item

headers

POST /ebusiness/order/v1/orders/graphql HTTP/2 Host: ecom-api.costco.com User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:148.0) Gecko/20100101 Firefox/148.0 Accept: / Accept-Language: en-US,en;q=0.9 Accept-Encoding: gzip, deflate, br, zstd costco.service: restOrders costco.env: ecom costco-x-authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IlhrZTFoNXg5TV9ZMk5ER0YxU1hDX2xNNnVSTU5tZTJ3STBLRDlHNzl1QmciLCJ0eXAiOiJKV1QifQ.eyJleHAiOjE3NzM2NjUzODUsIm5iZiI6MTc3MzY2NDQ4NSwidmVyIjoiMS4wIiwiaXNzIjoiaHR0cHM6Ly9zaWduaW4uY29zdGNvLmNvbS9lMDcxNGRkNC03ODRkLTQ2ZDYtYTI3OC0zZTI5NTUzNDgzZWIvdjIuMC8iLCJzdWIiOiIzMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJhdWQiOiJhM2E1MTg2Yi03Yzg5LTRiNGMtOTNhOC1kZDYwNGU5MzA3NTciLCJhY3IiOiJCMkNfMUFfU1NPX1dDU19zaWdudXBfc2lnbmluXzIwMSIsIm5vbmNlIjoiNzg5MjIzOGUtOWU3NC00MzExLWI2NDItMzQ1NTY4ZDY3NTk4IiwiaWF0IjoxNzczNjY0NDg1LCJhdXRoX3RpbWUiOjE3NzM2NjQ0ODQsImF1dGhlbnRpY2F0aW9uU291cmNlIjoibG9jYWxBY2NvdW50QXV0aGVudGljYXRpb24iLCJlbWFpbCI6ImpvaG5tb3Nlc2NhcnRlckBnbWFpbC5jb20iLCJuYW1lIjoiRW1wdHkgRGlzcGxheW5hbWUiLCJ1c2VySWRlbnRpdGllcyI6W3siaXNzdWVyIjoiYTNhNTE4NmItN2M4OS00YjRjLTkzYTgtZGQ2MDRlOTMwNzU3IiwiaXNzdWVyVXNlcklkIjoiQUFEOjMxMjNlZDZjLWM3MzgtNGI5OS05MDBkLTE0MTVlNTM2MDZjZSJ9LHsiaXNzdWVyIjoiNDkwMGViMWYtMGMxMC00YmQ5LTk5YzMtYzU5ZTZjMWVjZWJmIiwiaXNzdWVyVXNlcklkIjoiYTZmZmRkOTktNDM2OC00NTgwLTgxOWYtZTZjZjYxM2U1M2M1In0seyJpc3N1ZXIiOiIyZGQ0YjE0NS0zYmRhLTQ2NjktYWU2YS0zN2I4Y2I2ZGFmN2YiLCJpc3N1ZXJVc2VySWQiOiJhNmZmZGQ5OS00MzY4LTQ1ODAtODE5Zi1lNmNmNjEzZTUzYzUifV0sImlzc3VlclVzZXJJZCI6IkFBRDozMTIzZWQ2Yy1jNzM4LTRiOTktOTAwZC0xNDE1ZTUzNjA2Y2UiLCJjbGllbnRJZCI6ImEzYTUxODZiLTdjODktNGI0Yy05M2E4LWRkNjA0ZTkzMDc1NyIsInJlbWVtYmVyTWUiOiJGYWxzZSIsInNlbmRNZUVtYWlsIjoib2ZmIiwiaXBBZGRyZXNzIjoiOTYuMjQxLjIxMi4xMjUiLCJDb3JyZWxhdGlvbklkIjoiMDk0YTE5NDYtZTMwNS00ZDkzLWEyMzQtM2ZiNGMwMjMyNDhhIn0.FdsVFHsewvpQABvkEz4uA0NUlYwvlBEg-frJbUDIJRTsP59Be0bOt8Zqv6cZhUqBn_lTQEyi9tnvpkpycmNy7Rg5zLfYroH6mNALRqkBm8VbcmrEVDM1HmdNTHgO9vQD4TdKm1ZYkA7Pj_6QY3sDxI4ioOzIz1_XOnoJVAXjEwGfr8hgvqtlaC51M5DsfIGQj3zCaJrQnD6GBJlFmLNUpCulpT16WAaB1lT_pcycfBs-e1xnEd33dX0kHBOZ8pFS-IKjV_44ZK9R8jI9WHx5ThX3-DtyqjkJ0JypmhT9uEa0MeT55U7aeKPbMvQ0exiw3culKgiWDhvdp8e2EkExsg Content-Type: application/json-patch+json costco-x-wcs-clientId: 4900eb1f-0c10-4bd9-99c3-c59e6c1ecebf client-identifier: 481b1aec-aa3b-454b-b81b-48187e28f205 Content-Length: 2916 Origin: https://www.costco.com DNT: 1 Sec-GPC: 1 Connection: keep-alive Referer: https://www.costco.com/ Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-site Priority: u=0 TE: trailers

request

{"query":"query receiptsWithCounts($barcode: String!,$documentType:String!) {\n receiptsWithCounts(barcode: $barcode,documentType:$documentType) {\nreceipts{\n warehouseName\n receiptType \n documentType \n transactionDateTime \n transactionDate \n companyNumber \n warehouseNumber \n operatorNumber \n warehouseName \n warehouseShortName \n registerNumber \n transactionNumber \n transactionType\n transactionBarcode \n total \n warehouseAddress1 \n warehouseAddress2 \n warehouseCity \n warehouseState \n warehouseCountry \n warehousePostalCode\n totalItemCount \n subTotal \n taxes\n total \n invoiceNumber\n sequenceNumber\n itemArray { \n itemNumber \n itemDescription01 \n frenchItemDescription1 \n itemDescription02 \n frenchItemDescription2 \n itemIdentifier \n itemDepartmentNumber\n unit \n amount \n taxFlag \n merchantID \n entryMethod\n transDepartmentNumber\n fuelUnitQuantity\n fuelGradeCode\n fuelUnitQuantity\n itemUnitPriceAmount\n fuelUomCode\n fuelUomDescription\n fuelUomDescriptionFr\n fuelGradeDescription\n fuelGradeDescriptionFr\n\n } \n tenderArray { \n tenderTypeCode\n tenderSubTypeCode\n tenderDescription \n amountTender \n displayAccountNumber \n sequenceNumber \n approvalNumber \n responseCode \n tenderTypeName \n transactionID \n merchantID \n entryMethod\n tenderAcctTxnNumber \n tenderAuthorizationCode \n tenderTypeName\n tenderTypeNameFr\n tenderEntryMethodDescription\n walletType\n walletId\n storedValueBucket\n } \n subTaxes { \n tax1 \n tax2 \n tax3 \n tax4 \n aTaxPercent \n aTaxLegend \n aTaxAmount\n aTaxPrintCode\n aTaxPrintCodeFR \n aTaxIdentifierCode \n bTaxPercent \n bTaxLegend \n bTaxAmount\n bTaxPrintCode\n bTaxPrintCodeFR \n bTaxIdentifierCode \n cTaxPercent \n cTaxLegend \n cTaxAmount\n cTaxIdentifierCode \n dTaxPercent \n dTaxLegend \n dTaxAmount\n dTaxPrintCode\n dTaxPrintCodeFR \n dTaxIdentifierCode\n uTaxLegend\n uTaxAmount\n uTaxableAmount\n } \n instantSavings \n membershipNumber \n }\n }\n }","variables":{"barcode":"21111500804012603121616","documentType":"warehouse"}}

response

{"data":{"receiptsWithCounts":{"receipts":[{"warehouseName":"MT VERNON","receiptType":"In-Warehouse","documentType":"WarehouseReceiptDetail","transactionDateTime":"2026-03-12T16:16:00","transactionDate":"2026-03-12","companyNumber":1,"warehouseNumber":1115,"operatorNumber":43,"warehouseShortName":"MT VERNON","registerNumber":8,"transactionNumber":401,"transactionType":"Sales","transactionBarcode":"21111500804012603121616","total":208.58,"warehouseAddress1":"7940 RICHMOND HWY","warehouseAddress2":null,"warehouseCity":"ALEXANDRIA","warehouseState":"VA","warehouseCountry":"US","warehousePostalCode":"22306","totalItemCount":24,"subTotal":202.01,"taxes":6.57,"invoiceNumber":null,"sequenceNumber":null,"itemArray":[{"itemNumber":"34779","itemDescription01":"ROMANO","frenchItemDescription1":null,"itemDescription02":"CS=15 SL120 T9H6","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":19,"unit":1,"amount":20.93,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":19,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":11.69,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"7950","itemDescription01":"4LB COSMIC","frenchItemDescription1":null,"itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":5.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"2005","itemDescription01":"25# FLOUR","frenchItemDescription1":null,"itemDescription02":"ALL-PURPOSE HARV P98/100","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":9.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1941976","itemDescription01":"BREAD FLOUR","frenchItemDescription1":null,"itemDescription02":"12 LBS 180P 20X9","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":9.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"4873222","itemDescription01":"ALL F&C","frenchItemDescription1":null,"itemDescription02":"200OZ 160LOADS P104","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":14,"unit":1,"amount":19.99,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":14,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":19.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"374664","itemDescription01":"/ 4873222","frenchItemDescription1":"4873222","itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":14,"unit":-1,"amount":-5,"taxFlag":null,"merchantID":null,"entryMethod":null,"transDepartmentNumber":14,"fuelUnitQuantity":null,"fuelGradeCode":null,"itemUnitPriceAmount":0,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"60357","itemDescription01":"MIXED PEPPER","frenchItemDescription1":null,"itemDescription02":"6-PACK","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":7.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":7.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"30669","itemDescription01":"BANANAS","frenchItemDescription1":null,"itemDescription02":"3 LB / 1.36 KG","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":2,"amount":2.98,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":1.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1025795","itemDescription01":"KS 5DZ EGGS","frenchItemDescription1":null,"itemDescription02":"SL21 P120 / P132 / P144","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":17,"unit":1,"amount":9.39,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":17,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.39,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"787876","itemDescription01":"KS TWNY PORT","frenchItemDescription1":null,"itemDescription02":"PORTUGAL CSPC# 773506","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":16,"unit":1,"amount":17.99,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":16,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":17.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"22093","itemDescription01":"KS SHRP CHDR","frenchItemDescription1":null,"itemDescription02":"EC20T9H5 W12T13H5 SL130","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":17,"unit":1,"amount":5.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":17,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1956177","itemDescription01":"BRWNBTTRGRV","frenchItemDescription1":null,"itemDescription02":"MCCORMICK C12T19H7 L228","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":2.97,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":2.97,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1136340","itemDescription01":"3LB ORG GALA","frenchItemDescription1":null,"itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":4.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":4.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"7609681","itemDescription01":"CASCADE GEL","frenchItemDescription1":null,"itemDescription02":"125OZ T60H3P180","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":14,"unit":1,"amount":12.49,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":14,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":12.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"18001","itemDescription01":"TBLE SALT 4#","frenchItemDescription1":null,"itemDescription02":"DIAMOND CRYSTAL P=600","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":1.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":1.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"27003","itemDescription01":"STRAWBERRIES","frenchItemDescription1":null,"itemDescription02":"908 G / 2 LB","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":65,"unit":1,"amount":5.29,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":65,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.29,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1886266","itemDescription01":"SKO 5%","frenchItemDescription1":null,"itemDescription02":"48 OZ T10H8 SL30","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":17,"unit":1,"amount":5.79,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":17,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.79,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"4102","itemDescription01":"8\" TORTILLAS","frenchItemDescription1":null,"itemDescription02":"SL10 70OZ","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":5.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"87745","itemDescription01":"ROTISSERIE","frenchItemDescription1":null,"itemDescription02":"USDA GRADE A","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":63,"unit":1,"amount":4.99,"taxFlag":"D","merchantID":null,"entryMethod":null,"transDepartmentNumber":63,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":4.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"110784","itemDescription01":"15 GRAIN BRD","frenchItemDescription1":null,"itemDescription02":"PEPPERIDGE FARM 2/24 OZ","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":5.69,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":5.69,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"47492","itemDescription01":"CELERY SALAD","frenchItemDescription1":null,"itemDescription02":"APPLE CIDER VINAIGRETTE","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":63,"unit":1,"amount":12.62,"taxFlag":"D","merchantID":null,"entryMethod":null,"transDepartmentNumber":63,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":4.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"2287780","itemDescription01":"BTB CHICKEN","frenchItemDescription1":null,"itemDescription02":"C12T10H9 P1080 SL630","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":9.49,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":9.49,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"917546","itemDescription01":"JIF CREAMY","frenchItemDescription1":null,"itemDescription02":"PEANUT BUTTER SL540 P300","frenchItemDescription2":null,"itemIdentifier":"E","itemDepartmentNumber":13,"unit":1,"amount":11.99,"taxFlag":"3","merchantID":null,"entryMethod":null,"transDepartmentNumber":13,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":11.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"1768123","itemDescription01":"BBEE KIDS4PC","frenchItemDescription1":null,"itemDescription02":"FY26 P1600 T200 H8","frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":39,"unit":1,"amount":17.99,"taxFlag":"Y","merchantID":null,"entryMethod":null,"transDepartmentNumber":39,"fuelUnitQuantity":10.0,"fuelGradeCode":null,"itemUnitPriceAmount":17.99,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null},{"itemNumber":"374558","itemDescription01":" 1768123","frenchItemDescription1":"/1768123","itemDescription02":null,"frenchItemDescription2":null,"itemIdentifier":null,"itemDepartmentNumber":39,"unit":-1,"amount":-4,"taxFlag":null,"merchantID":null,"entryMethod":null,"transDepartmentNumber":39,"fuelUnitQuantity":null,"fuelGradeCode":null,"itemUnitPriceAmount":0,"fuelUomCode":null,"fuelUomDescription":null,"fuelUomDescriptionFr":null,"fuelGradeDescription":null,"fuelGradeDescriptionFr":null}],"tenderArray":[{"tenderTypeCode":"061","tenderSubTypeCode":null,"tenderDescription":"VISA","amountTender":208.58,"displayAccountNumber":"9070","sequenceNumber":null,"approvalNumber":null,"responseCode":null,"tenderTypeName":"VISA","transactionID":null,"merchantID":null,"entryMethod":null,"tenderAcctTxnNumber":null,"tenderAuthorizationCode":null,"tenderTypeNameFr":null,"tenderEntryMethodDescription":null,"walletType":null,"walletId":null,"storedValueBucket":null}],"subTaxes":{"tax1":null,"tax2":null,"tax3":null,"tax4":null,"aTaxPercent":null,"aTaxLegend":"A","aTaxAmount":4.62,"aTaxPrintCode":null,"aTaxPrintCodeFR":null,"aTaxIdentifierCode":null,"bTaxPercent":null,"bTaxLegend":null,"bTaxAmount":null,"bTaxPrintCode":null,"bTaxPrintCodeFR":null,"bTaxIdentifierCode":null,"cTaxPercent":null,"cTaxLegend":"C","cTaxAmount":1.25,"cTaxIdentifierCode":null,"dTaxPercent":null,"dTaxLegend":"D","dTaxAmount":0.7,"dTaxPrintCode":null,"dTaxPrintCodeFR":null,"dTaxIdentifierCode":null,"uTaxLegend":null,"uTaxAmount":null,"uTaxableAmount":null},"instantSavings":9,"membershipNumber":"111894291684"}]}}}

online order

appears to be same endpoint. request: {"query":"query getOnlineOrders($startDate:String!, $endDate:String!, $pageNumber:Int , $pageSize:Int, $warehouseNumber:String! ){\n getOnlineOrders(startDate:$startDate, endDate:$endDate, pageNumber : $pageNumber, pageSize : $pageSize, warehouseNumber : $warehouseNumber) {\n pageNumber\n pageSize\n totalNumberOfRecords\n bcOrders {\n orderHeaderId\n orderPlacedDate : orderedDate\n orderNumber : sourceOrderNumber \n orderTotal\n warehouseNumber\n status\n emailAddress\n orderCancelAllowed\n orderPaymentFailed : orderPaymentEditAllowed\n orderReturnAllowed\n orderLineItems {\n orderLineItemCancelAllowed\n →rderLineItemId\n orderReturnAllowed\n itemId\n itemNumber\n itemTypeId\n lineNumber\n itemDescription\n deliveryDate\n warehouseNumber\n status\n orderStatus\n parentOrderLineItemId\n isFSAEligible\n shippingType\n shippingTimeFrame\n isShipToWarehouse\n carrierItemCategory\n carrierContactPhone\n programTypeId\n isBuyAgainEligible\n scheduledDeliveryDate\n scheduledDeliveryDateEnd\n configuredItemData\n shipment {\n shipmentId \n orderHeaderId\n orderShipToId \n lineNumber \n orderNumber\n shippingType \n shippingTimeFrame \n shippedDate \n packageNumber \n trackingNumber \n trackingSiteUrl \n carrierName \n estimatedArrivalDate \n deliveredDate \n isDeliveryDelayed \n isEstimatedArrivalDateEligible \n statusTypeId \n status \n pickUpReadyDate\n pickUpCompletedDate\n reasonCode\n trackingEvent {\n event\n carrierName\n eventDate\n estimatedDeliveryDate\n scheduledDeliveryDate\n trackingNumber\n }\n }\n }\n }\n }\n }","variables":{"pageNumber":1,"pageSize":10,"startDate":"2026-1-01","endDate":"2026-3-31","warehouseNumber":"847"}}

need new scope changes:

  • pull all orders by default
  • add online orders
  • copy header data from browser using selenium

how to run

python scrape_giant.py python enrich_giant.py python scrape_costco.py python enrich_costco.py python build_observed_products.py python build_review_queue.py python build_canonical_layer.py python validate_cross_retailer_flow.py

t1.13 tasks [2026-03-17 Tue 13:49]

ok i ran a few. time to run some cleanups here - i'm wondering if we shouldn't be less aggressive with canonical names and encourage a better manual process to start.

TODO fill in auto-created canonical category, product-type

auto-created canonical_names lack category, product_type - ok with filling these in manually in the catalog once the queue is empty

TODO consolidation cleanup

  1. canonical_names feel too specific, e.g., "5DZ egg" - probably a problem with the enrich_* steps not adding appropraite normalizing data and removing from observed product title?
  2. some canonical_names need consolidation, eg "LIME" and "LIME . / ." ; poss cleanup issue. there are 5 entries for ergg but but they are all regular large grade A white eggs, just different amounts in dozens. Eggs are actually a great candidate for the kind of analysis we want to do - the pipeline should have caught and properly sorted these into size/qty:

    ```canonical_product_id	canonical_name	category	product_type	brand	variant	size_value	size_unit	pack_qty	measure_type	notes	created_at	updated_at
    gcan_0e350505fd22	5DZ EGG / /			KS					each	auto-linked via exact_name		
    gcan_47279a80f5f3	EGG 5 DOZ. BBS								each	auto-linked via exact_name		
    gcan_7d099130c1bf	LRG WHITE EGG			SB				30	count	auto-linked via exact_upc		
    gcan_849c2817e667	GDA LRG WHITE EGG			SB				18	count	auto-linked via exact_upc		
    gcan_cb0c6c8cf480	LG EGG CONVENTIONAL					18	count		count	auto-linked via exact_name_size		  ```
    

TODO costco discount matching

Build costco mechanism for matching discount to line item.

  1. Discounts appear as their own line items with a number like /123456, this matches the UPC of the discounted item
  2. must be date-matched to the UPC

Data model might be missing shape:

  1. match discount rows like `item_name:/2303476` to `retailer_item_id:2303476`
  2. display this value on the item somehow? maybe update line_total? otherwise we lose fidelity. should be stored in items_enriched somehow
```retailer	order_id	line_no	observed_item_key	order_date	retailer_item_id	pod_id	item_name	upc	category_id	category	qty	unit	unit_price	line_total	picked_weight	mvp_savings	reward_savings	coupon_savings	coupon_price	image_url	raw_order_path	item_name_norm	brand_guess	variant	size_value	size_unit	pack_qty	measure_type	is_store_brand	is_fee	is_discount_line	is_coupon_line	price_per_each	price_per_lb	price_per_oz	parse_version	parse_notes
costco	2.11115E+22	3	costco:21111520101942404241753:3	4/24/2024	2303476		KA 6QT MIXER P16 KSM60SECXER/CU FY23		33	33	1	None	399.99	399.99							costco_output/raw/21111520101942404241753-2024-04-24T17-53-00.json	KA 6QT MIXER KSM60SECXER/CU						each	FALSE	FALSE	FALSE	FALSE	399.99			costco-enrich-v1	
costco	2.11115E+22	4	costco:21111520101942404241753:4	4/24/2024	325173		/2303476		33	33	-1	None	0	-100				-100			costco_output/raw/21111520101942404241753-2024-04-24T17-53-00.json	/2303476						each	FALSE	FALSE	TRUE	TRUE	100			costco-enrich-v1	```

TODO giant discount matching

prompt

do not add new abstractions unless they remove real duplication. prefer explicit retailer-specific logic over generic heuristics. do not auto-create new canonical products from weak normalized names. and propose the smallest set of edits needed.

1.13 fixes

15x Costco discounts not caught

  • 15x, some with slash-space: `/ 1768123`and some without: `/2303476`

canonical names suck - tempted to force manual config from scratch?

  • maybe first-pass should be naming groups, starting with largest groups and going on down.
  • unfortunately not seeing many cross-retailer items? looks like costco-only; just taking Giant as gospel
  • could be as simple as changing canonical name in canonical_catalog.csv
  • tough to figure out where the data is, leading to below:

need to refactor whole flow and where data is stored

group by browser or by site, or both? currently mixed.

  1. Scrape

    • Script:
    • Output: /output/raw/orderN.json, history.json, orders.csv, history.csv
  2. Enrich

    • Scripts:
    • Output: /output/enrich/items.json
  3. Combined - output?

    • Review step?

propsed fixes

1.14 prep - OBE

[ ] t1.14.1 define and document the filesystem/data-layer layout (2-3 commits)

make stage ownership and retailer ownership explicit so every artifact has one obvious home

AC

  1. define and document the canonical directory layout for the pipeline, separating retailer-specific artifacts from shared combined artifacts
  2. adopt an explicit layout of the form:

    • `data/<retailer>/raw/`
    • `data/<retailer>/orders.csv`
    • `data/<retailer>/items.csv`
    • `data/<retailer>/items_enriched.csv`
    • `data/combined/products_observed.csv`
    • `data/combined/review_queue.csv`
    • `data/combined/item_aliases.csv`
    • `data/combined/canonical_catalog.csv`
    • `data/combined/product_links.csv`
    • `data/combined/purchases.csv`
    • `data/combined/pipeline_status.csv`
    • `data/combined/pipeline_status.json`
  3. update docs/readme and pipeline docs so each scripts inputs and outputs point to the new layout
  4. remove or deprecate ambiguous stage outputs living under a retailer-specific output directory when they are actually shared artifacts
  • pm note: goal is “where does this file live?” should have one answer, not three

evidence

  • commit:
  • tests:
  • date:

notes

[ ] t1.14.2 define the row-level data model for raw, enriched, observed, canonical, and purchases layers (2-4 commits)

lock the item model before further refactors so each stage has a clear grain and purpose

AC

  1. document the row grain for each layer:

    • raw item row = one receipt line from one retailer order
    • enriched item row = one retailer line with retailer-specific parsed fields
    • observed product row = one grouped retailer-facing product concept
    • canonical catalog row = one review-controlled product identity
    • purchase row = one final pivot-ready purchased item line
  2. define the required fields for each layer, including stable ids and provenance fields
  3. explicitly document which fields are allowed to be blank at each layer (e.g. `upc`, `canonical_item_id`, category)
  4. document the relationship between:

    • `raw_item_name`
    • `normalized_item_name`
    • `observed_product_id`
    • `canonical_item_id`
  5. document how retailer-native ids (e.g. Costco `retailer_item_id`) fit into the shared model without being forced into `upc`
  • pm note: this is the schema contract task; code should follow it, not invent it ad hoc

evidence

  • commit:
  • tests:
  • date:

notes

[ ] t1.14.3 refactor pipeline outputs to the new layout without changing semantics (2-4 commits)

move files and script defaults to the new structure while preserving current behavior

AC

  1. update scraper and enrich scripts to write retailer-specific outputs under `data/<retailer>/…`
  2. update combined/shared scripts to read from retailer-specific enriched outputs and write to `data/combined/…`
  3. preserve current content/meaning of outputs during the move; this is a location/structure refactor, not a behavior rewrite
  4. update tests, docs, and script defaults to use the new paths
  • pm note: do not mix data-layout cleanup with canonical/review logic changes in this task

evidence

  • commit:
  • tests:
  • date:

notes

[ ] t1.14.4 make the review and catalog layer explicit and authoritative (2-4 commits)

treat review and canonical resolution as first-class data, not incidental byproducts

AC

  1. define `review_queue.csv`, `item_aliases.csv`, and `canonical_catalog.csv` as the authoritative review/catalog files in `data/combined/`
  2. document the intended purpose of each:

    • `review_queue.csv` = unresolved observed items needing action
    • `item_aliases.csv` = approved mapping from observed/normalized names to canonical ids
    • `canonical_catalog.csv` = review-controlled canonical product definitions and display names
  3. ensure final purchase generation reads from these files as the source of truth for resolution
  4. stop relying on weak implicit canonical creation as a substitute for the explicit review/catalog layer
  • pm note: this is the control-plane task; observed products may be automatic, canonical products are review-controlled

evidence

  • commit:
  • tests:
  • date:

notes

[ ] t1.14.5 define and document the final pivot-ready purchases output (2-3 commits)

make the final analysis artifact explicit so excel/pivot/chart use is a first-class target

AC

  1. define `data/combined/purchases.csv` as the final normalized purchase log
  2. ensure each purchase row retains:

    • purchase date
    • retailer
    • order id
    • raw item name
    • normalized item name
    • canonical item id when resolved
    • quantity and unit
    • original line total
    • discount-adjusted fields when applicable
    • store/location fields where available
  3. document that `purchases.csv` is the primary excel/pivot input and that earlier files are staging layers
  4. document expected pivot uses such as purchase frequency and cost over time by canonical item
  • pm note: this task is about making the final artifact explicit and stable, not about adding new metrics

evidence

  • commit:
  • tests:
  • date:

notes

pipeline prep [2026-03-17 Tue]

data saved to /data

  1. "scrape_<retailer>" gathers data from a retailer and outputs:

    1. raw list of items per visit ./<retailer>/scraped/raw/order-<uid>.json
    2. raw list of visits ./<retailer>/scraped_visits.csv
    3. raw list of items from all visits ./<retailer>/scraped_items.csv
  2. "enrich <retailer>" takes scraped data and outputs:

    1. normalized list of items ./<retailer>/enriched_items.csv
  3. "combine" takes retailer

input:

  1. all enriched items ./<retailer>/enriched_items.csv
  2. all retailer visits ./<retailer>/scraped_visits.csv

outputs:

  1. observed product groups ./combined/observed/products_observed.csv
  2. unresolved products for review ./combined/review/review_queue.csv
  3. pipeline accounting/status ./combined/status/pipeline_status.csv
  4. pipeline accounting/status ./combined/status/pipeline_status.json
  1. review resolves unknown or weakly identified products and maintains:

    1. canonical product catalog ./combined/review/canonical_catalog.csv
    2. approved alias mappings ./combined/review/item_aliases.csv
    3. optional observed→canonical links ./combined/review/product_links.csv
  2. build purchases takes combined observed data plus review/catalog data and outputs: [1]. final normalized purchase log ./combined/purchases/purchases.csv

lets get this pipeline right before more refactoring.

Pipeline - moved to data-model.org [2026-03-18 Wed]

Key:

  • (1) input
  • [2] output

Each step can be run alone if its dependents exist.

1. Collect

Get raw receipt/visit and item data from a retailer. Scraping is unique to a Retailer and method (e.g., Giant-Web and Giant-Scan). Preserve complete raw data and preserve fidelity. Avoid interpretation beyond basic data flattening.

  • (1) Source access (Varies, eg header data, auth for API access)
  • [1] collected visits from each retailer
  • [2] collected items from each retailer
  • [3] any other raw data that supports [1] and [2]; explicit source (eventual receipt scan?)

2. Normalize

Parse and extract structured facts from retailer-specific raw data to create a standardized item format. Strictly dependent on Collect method and output.

  • Extract quantity, size, pack, pricing, variant
  • Consolidate discount with item using upc/retail_item_id and concurrence
  • Cleanup naming to facilitate later matching
  • (1) collected items from each retailer
  • (2) collected visits from each retailer
  • [1] normalized items from each retailer

3. Review/Combine (Canonicalization)

Decide whether two normalized retailer items are "the same product"; match items across retailers using algo/logic and human review. Create catalog linked to normalized items.

  • Grouping the same item from retailer
  • Asking human to create a canonical/catalog item with:

    • friendly/canonical_name: "bell pepper"; "milk"
    • category: "produce"; "dairy"
    • product_type: "pepper"; "milk"
    • ? variant? "whole, "skim", "2pct"
  • (1) normalized items from each retailer
  • [1] review queue of items to be reviewed
  • [2] catalog (lookup table) of confirmed retailer_item and canonical_name
  • [3] canonical purchase list, pivot-ready

Unresolved Issues

  1. Create tags: canonical_name (need better label), category, product_type is missing data like Variant, shouldn't this be part of the normalization step?
  2. need central script to orchestrate; metadata belongs here and nowhere else

Symptoms

  • `LIME` and `LIME . / .` appearing in canonical_catalog:

    • names must come from review-approved names, not raw strings

notes

to fix

Done

fuji apple, apple, produce (not apple, fruit, produce) spinach, , produce -> frozen vs fresh? frozen chicken thighs -> rotisserie chicken, chicken, poultry -> rotisserie chicken, chicken, meat beef patty, hamburger, meat -> hamburger patty, beef, meat oats > cereal cheerios > cereal

takeaways

  • variants not caught, how to fix?

catalog_name = what you actually bought product_type = reasonable substitute category = store aisle

  1. Using different categories maintains a direct comparison (product_type==spinach) and a distinction.

fresh spinach, spinach, produce frozen spinach, spinach, frozen

include in catalog_name:

  • form: frozen, fresh, ground, shredded
  • fat level: whole, skim, 2%
  • flavor when primary: vanilla yogurt vs plain yogurt
  • cut: diced tomatoes vs crushed tomatoes
  • species when relevant: gala apple vs fuji apple

exclude from catalog_name:

  • package size / multipack count
  • promo wording; adjectives like "premium"; retailer marketing fluff

AC

  1. fix internal search flow, add same menu

    Review 4/345: SHRP CHDR
    5 matched items:
    [1] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2026-03-12 |  5.49 | 
    [2] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2025-01-24 | 12.58 | 
    [3] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2025-01-10 |  6.29 | 
    [4] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2024-12-14 |  6.29 | 
    [5] KS SHRP CHDR EC20T9H5 W12T13H5 SL130 | costco | 2024-08-06 |  5.99 | 
    no catalog_name suggestions found
    [f]ind  [n]ew  [s]kip  e[x]clude  [q]uit >
    f
    search: cheddar
    1 search results found:
    [1] cheddar cheese, cheese, dairy (0 items, 0 rows)
    -  selection: 1
    + [#] link to suggestion  [f]ind  [n]ew  [s]kip  e[x]clude  [q]uit >
    

instead of

 search: banana
 no matches found
- search again? [enter=yes, q=no]:
+ [f]ind  [n]ew  [s]kip  e[x]clude  [q]uit >