fd9d656e13425ff45235ad667858ee500f60dd4d
Table of Contents
Project Goals
- Document and analyze sentiment of public comments on Virginia law, to determine:
- the utility of this forum as a mechanism for public comment, and
- the impact of this forum on Virginia regulation.
- Make data and insights broadly available.
- Generalize to other public comment tools.
Document and analyze sentiment
- Scrape the data, parse, clean, and store. Clearly separate scraper from sentiment analyzer for maximum auditability.
- Build tests for identifying abuse, such as spam and account fraud
- Identify any patterns connecting measured sentiment against VA decisions
Make data available
- Pick a good visualization tool
Generalize
- Identify scalable ways to apply this toolset to similar problems
Architecture
- Scrape/Parse: Scrapy for downloading comments
- Storage: json
- Sentiment analysis: Claude haiku
- Display: TBD
Scraper
Scrapy provides a simple mechanism for browsing and
- Forums listing page: `Forums.cfm` - lists all open forums with agency, reg title, action type, brief description, closing date, comment count
- Comment listing page: `comments.cfm?GDocForumID=X` or `comments.cfm?stageid=X` or `comments.cfm?petitionid=X` - lists comments with title, author, date
- Individual comment page: `viewcomments.cfm?commentid=X` - shows regulation title + brief description at the top, plus the comment
Storage
One JSONL file per forum/bill.
Analysis
Google and Amazon both return generic sentiment (tone of writing: positive/negative), not stance (for/against the regulation): "I strongly believe the government should NOT interfere" is negative tone but "against" the regulation. We will run the forum/bill title and cache the entirety of the proposed change, perhaps as a fallback.
| Tool | Output | Context | Sarcasm | Context window | Cost/1k comments |
|---|---|---|---|---|---|
| Google NL API | -1→+1, magnitude | No/generic | Poorly | No | ~$1–2 |
| Amazon Comprehend | Pos/Neg/Neutral/Mixed | No/generic | Poorly | No | ~$0.10 |
| Claude Haiku | Prompted → for/against/neutral | Yes | Yes, with prompt | Yes | ~$0.10–0.30 |
| GPT-4o-mini | Prompted → same | Yes | Yes | Yes | ~$0.05–0.15 |
Roadmap
- Scrape one forum
- Compare sentiment models
- Display
- Scrape all data
- Scale?
Description
Document and analyze sentiment of public comments on Virginia law, to determine the utility of this forum as a mechanism for public comment, and the impact of this forum on Virginia regulation.
Languages
Python
100%