#+title: VA Townhall #+date: [2026-05-05 Tue] #+version: 1 * Project Goals 1. Document and analyze sentiment of public comments on Virginia law, to determine: 1. the utility of this forum as a mechanism for public comment, and 2. the impact of this forum on Virginia regulation. 2. Make data and insights broadly available. 3. Generalize to other public comment tools. ** Document and analyze sentiment - Scrape the data, parse, clean, and store. Clearly separate scraper from sentiment analyzer for maximum auditability. - Build tests for identifying abuse, such as spam and account fraud - Identify any patterns connecting measured sentiment against VA decisions ** Make data available - Pick a good visualization tool ** Generalize - Identify scalable ways to apply this toolset to similar problems * Architecture 1. Scrape/Parse: **Scrapy** for downloading comments 2. Storage: json 3. Sentiment analysis: Claude haiku 4. Display: TBD ** Scraper Scrapy provides a simple mechanism for browsing and 1. Forums listing page: `Forums.cfm` - lists all open forums with agency, reg title, action type, brief description, closing date, comment count 2. Comment listing page: `comments.cfm?GDocForumID=X` or `comments.cfm?stageid=X` or `comments.cfm?petitionid=X` - lists comments with title, author, date 3. Individual comment page: `viewcomments.cfm?commentid=X` - shows regulation title + brief description at the top, plus the comment ** Storage One JSONL file per forum/bill. ** Analysis Google and Amazon both return generic sentiment (tone of writing: positive/negative), not stance (for/against the regulation): "I strongly believe the government should NOT interfere" is negative tone but "against" the regulation. We will run the forum/bill title and cache the entirety of the proposed change, perhaps as a fallback. | Tool | Output | Context | Sarcasm | Context window | Cost/1k comments | |-------------------+--------------------------------+------------+------------------+----------------+------------------| | Google NL API | -1→+1, magnitude | No/generic | Poorly | No | ~$1–2 | | Amazon Comprehend | Pos/Neg/Neutral/Mixed | No/generic | Poorly | No | ~$0.10 | | Claude Haiku | Prompted → for/against/neutral | Yes | Yes, with prompt | Yes | ~$0.10–0.30 | | GPT-4o-mini | Prompted → same | Yes | Yes | Yes | ~$0.05–0.15 | * Roadmap 1. Scrape one forum 2. Compare sentiment models 3. Display 4. Scrape all data 5. Scale?