# Table of Contents 1. [Project Goals](#org863a759) 2. [Architecture](#orgcd91fd0) 1. [Scraper](#org3256ad3) 2. [Storage](#org7a9a92c) 3. [Analysis](#org6ed72dc) 3. [Roadmap](#org416f14d) # Project Goals 1. Document and analyze sentiment of public comments on Virginia law, to determine: 1. the utility of this forum as a mechanism for public comment, and 2. the impact of this forum on Virginia regulation. 2. Make data and insights broadly available. 3. Generalize to other public comment tools. # Architecture 1. Scrape/Parse: ****Scrapy**** for downloading comments 2. Storage: json 3. Sentiment analysis: Claude haiku 4. Display: TBD ## Scraper Scrapy provides a simple mechanism for browsing and 1. Forums listing page: \`Forums.cfm\` - lists all open forums with agency, reg title, action type, brief description, closing date, comment count 2. Comment listing page: \`comments.cfm?GDocForumID=X\` or \`comments.cfm?stageid=X\` or \`comments.cfm?petitionid=X\` - lists comments with title, author, date 3. Individual comment page: \`viewcomments.cfm?commentid=X\` - shows regulation title + brief description at the top, plus the comment ## Storage One JSONL file per forum/bill. ## Analysis Google and Amazon both return generic sentiment (tone of writing: positive/negative), not stance (for/against the regulation): "I strongly believe the government should NOT interfere" is negative tone but "against" the regulation. We will run the forum/bill title and cache the entirety of the proposed change, perhaps as a fallback.

Tool	Output	Context	Sarcasm	Context window	Cost/1k comments
Google NL API	-1→+1, magnitude	No/generic	Poorly	No	~$1–2
Amazon Comprehend	Pos/Neg/Neutral/Mixed	No/generic	Poorly	No	~$0.10
Claude Haiku	Prompted → for/against/neutral	Yes	Yes, with prompt	Yes	~$0.10–0.30
GPT-4o-mini	Prompted → same	Yes	Yes	Yes	~$0.05–0.15

# Roadmap 1. Scrape one forum 2. Compare sentiment models 3. Display 4. Scrape all data 5. Scale?