10 Commits

Author SHA1 Message Date
1ea696d818 added texts and fixes for mojibake 2026-05-07 17:22:16 -04:00
28d6d222bd added create_csv.py 2026-05-07 17:22:00 -04:00
f5d679808e completed openai batch work 2026-05-07 07:24:11 -04:00
64a7a18721 openai batch refactor 2026-05-06 13:53:50 -04:00
e1ad4432a7 refactor/batch-openai prep 2026-05-06 13:29:59 -04:00
f3abbefac7 add gpt4o batch analysis 2026-05-05 16:50:10 -04:00
683bfb324f remove hyphen for underscore in nomenclature, remove dependency 2026-05-05 16:47:11 -04:00
d834d18c81 added 4o initial manual analysis and test 2026-05-05 15:00:34 -04:00
e7df0b24a1 1.1 cleanup 2026-05-05 13:50:04 -04:00
beb5cf461b t1.1: scrape one forum via ViewComments.cfm POST pagination
Spider fetches ViewComments.cfm?GdocForumID=N with vPerPage=500,
generates all page requests from page-1 metadata, and parses
each div.Cbox for comment_id, author, date, title, text, reg_title,
reg_desc. Handles span-wrapped comment text. Fixes UTF-8/windows-1251
meta-tag encoding mismatch. 9083 items, 15 empty-text (0.17%).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-05 12:28:07 -04:00